WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** [default0]:using world size: 288, data-parallel-size: 4, tensor-model-parallel size: 1, pipeline-model-parallel size: 72 [default0]:accumulate and all-reduce gradients in fp32 for bfloat16 data type. [default0]:using torch.bfloat16 for parameters ... [default0]:------------------------ arguments ------------------------ [default0]: abort_on_unmet_fused_kernel_constraints ......... True [default0]: accumulate_allreduce_grads_in_fp32 .............. True [default0]: adam_beta1 ...................................... 0.9 [default0]: adam_beta2 ...................................... 0.95 [default0]: adam_eps ........................................ 1e-08 [default0]: adlr_autoresume ................................. False [default0]: adlr_autoresume_interval ........................ 1000 [default0]: apply_query_key_layer_scaling ................... True [default0]: apply_residual_connection_post_layernorm ........ False [default0]: attention_dropout ............................... 0.1 [default0]: attention_softmax_in_fp32 ....................... False [default0]: bert_binary_head ................................ True [default0]: bert_load ....................................... None [default0]: bf16 ............................................ True [default0]: bias_dropout_fusion ............................. True [default0]: bias_gelu_fusion ................................ True [default0]: biencoder_projection_dim ........................ 0 [default0]: biencoder_shared_query_context_model ............ False [default0]: block_data_path ................................. None [default0]: checkpoint_activations .......................... True [default0]: checkpoint_in_cpu ............................... False [default0]: checkpoint_num_layers ........................... 1 [default0]: clip_grad ....................................... 1.0 [default0]: codecarbon_dir .................................. None [default0]: consumed_train_samples .......................... 0 [default0]: consumed_train_tokens ........................... 0 [default0]: consumed_valid_samples .......................... 0 [default0]: contigious_checkpointing ........................ False [default0]: cpu_optimizer ................................... False [default0]: cpu_torch_adam .................................. False [default0]: curriculum_learning ............................. False [default0]: data_impl ....................................... mmap [default0]: data_parallel_size .............................. 4 [default0]: data_path ....................................... None [default0]: dataloader_type ................................. single [default0]: DDP_impl ........................................ local [default0]: decoder_seq_length .............................. None [default0]: deepscale ....................................... False [default0]: deepscale_config ................................ None [default0]: deepspeed ....................................... True [default0]: deepspeed_activation_checkpointing .............. True [default0]: deepspeed_config ................................ ./ds_config.927259.json [default0]: deepspeed_mpi ................................... False [default0]: distribute_checkpointed_activations ............. False [default0]: distributed_backend ............................. nccl [default0]: embed_layernorm ................................. True [default0]: embedding_path .................................. None [default0]: encoder_seq_length .............................. 2048 [default0]: eod_mask_loss ................................... False [default0]: eval_interval ................................... 250 [default0]: eval_iters ...................................... 5 [default0]: eval_only ....................................... None [default0]: evidence_data_path .............................. None [default0]: exit_duration_in_mins ........................... 5990 [default0]: exit_interval ................................... None [default0]: ffn_hidden_size ................................. 57344 [default0]: finetune ........................................ False [default0]: fp16 ............................................ False [default0]: fp16_lm_cross_entropy ........................... False [default0]: fp32_residual_connection ........................ False [default0]: gigaflos_no_embeds .............................. 0 [default0]: global_batch_size ............................... 2048 [default0]: glu_activation .................................. None [default0]: hidden_dropout .................................. 0.1 [default0]: hidden_size ..................................... 14336 [default0]: hysteresis ...................................... 2 [default0]: ict_head_size ................................... None [default0]: ict_load ........................................ None [default0]: img_dim ......................................... 224 [default0]: indexer_batch_size .............................. 128 [default0]: indexer_log_interval ............................ 1000 [default0]: inference ....................................... False [default0]: init_method_std ................................. 0.0048 [default0]: init_method_xavier_uniform ...................... False [default0]: initial_loss_scale .............................. 4294967296 [default0]: kill_switch_path ................................ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/kill-switch-tr13-176B-mtf [default0]: kv_channels ..................................... 128 [default0]: layernorm_epsilon ............................... 1e-05 [default0]: lazy_mpu_init ................................... None [default0]: load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: local_rank ...................................... None [default0]: log_batch_size_to_tensorboard ................... True [default0]: log_interval .................................... 1 [default0]: log_learning_rate_to_tensorboard ................ True [default0]: log_level ....................................... None [default0]: log_level_replica ............................... None [default0]: log_loss_scale_to_tensorboard ................... True [default0]: log_num_zeros_in_grad ........................... False [default0]: log_params_norm ................................. False [default0]: log_path ........................................ None [default0]: log_timers_to_tensorboard ....................... True [default0]: log_validation_ppl_to_tensorboard ............... True [default0]: loss_on_targets_only ............................ False [default0]: loss_scale ...................................... None [default0]: loss_scale_window ............................... 1000 [default0]: lr .............................................. 2e-05 [default0]: lr_decay_iters .................................. None [default0]: lr_decay_samples ................................ None [default0]: lr_decay_style .................................. constant [default0]: lr_decay_tokens ................................. None [default0]: lr_warmup_fraction .............................. None [default0]: lr_warmup_iters ................................. 0 [default0]: lr_warmup_samples ............................... 0 [default0]: make_vocab_size_divisible_by .................... 128 [default0]: mask_prob ....................................... 0.15 [default0]: masked_softmax_fusion ........................... True [default0]: max_position_embeddings ......................... 2048 [default0]: mean_noise_span_length .......................... None [default0]: memory_centric_tiled_linear ..................... False [default0]: merge_file ...................................... None [default0]: micro_batch_size ................................ 1 [default0]: min_loss_scale .................................. 1.0 [default0]: min_lr .......................................... 0.0 [default0]: mmap_warmup ..................................... False [default0]: no_load_optim ................................... True [default0]: no_load_rng ..................................... None [default0]: no_save_optim ................................... None [default0]: no_save_rng ..................................... None [default0]: noise_density ................................... None [default0]: norm_target_loss ................................ True [default0]: num_attention_heads ............................. 112 [default0]: num_channels .................................... 3 [default0]: num_classes ..................................... 1000 [default0]: num_layers ...................................... 70 [default0]: num_layers_per_virtual_pipeline_stage ........... None [default0]: num_workers ..................................... 2 [default0]: onnx_safe ....................................... None [default0]: openai_gelu ..................................... False [default0]: optimizer ....................................... adam [default0]: override_lr_scheduler ........................... False [default0]: pad_vocab_size_to ............................... 250880 [default0]: params_dtype .................................... torch.bfloat16 [default0]: partition_activations ........................... False [default0]: patch_dim ....................................... 16 [default0]: pipeline_model_parallel_size .................... 72 [default0]: position_embedding_type ......................... PositionEmbeddingType.alibi [default0]: pp_partition_method ............................. type:transformer|embedding [default0]: prefixlm ........................................ False [default0]: profile_backward ................................ False [default0]: query_in_block_prob ............................. 0.1 [default0]: rampup_batch_size ............................... None [default0]: rank ............................................ 0 [default0]: remote_device ................................... none [default0]: reset_attention_mask ............................ False [default0]: reset_position_ids .............................. False [default0]: reset_progress .................................. True [default0]: retriever_report_topk_accuracies ................ [] [default0]: retriever_score_scaling ......................... False [default0]: retriever_seq_length ............................ 256 [default0]: reweight_loss_based_on_position_frequency ....... False [default0]: sample_rate ..................................... 1.0 [default0]: save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: save_interval ................................... 5 [default0]: scatter_gather_tensors_in_pipeline .............. True [default0]: scattered_embeddings ............................ False [default0]: seed ............................................ 42 [default0]: seq_length ...................................... 2048 [default0]: sgd_momentum .................................... 0.9 [default0]: short_seq_prob .................................. 0.1 [default0]: skip_train_iteration_range ...................... None [default0]: split ........................................... None [default0]: split_transformers .............................. False [default0]: sync_tp_duplicated_parameters ................... True [default0]: synchronize_each_layer .......................... False [default0]: tensor_model_parallel_size ...................... 1 [default0]: tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/tr13-176B-ml-t0-logs/tensorboard/p31lossseq [default0]: tensorboard_log_interval ........................ 1 [default0]: tensorboard_queue_size .......................... 5 [default0]: test_weighted_split_paths ....................... None [default0]: test_weighted_split_paths_path .................. None [default0]: tile_factor ..................................... 1 [default0]: titles_data_path ................................ None [default0]: tokenizer_name_or_path .......................... bigscience/tokenizer [default0]: tokenizer_type .................................. PretrainedFromHF [default0]: train_iters ..................................... None [default0]: train_samples ................................... 6348800 [default0]: train_tokens .................................... None [default0]: train_weighted_split_names ...................... ['train'] [default0]: train_weighted_split_paths ...................... [['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train']] [default0]: train_weighted_split_paths_path ................. None [default0]: train_weighted_split_splits ..................... [['0:1']] [default0]: train_weighted_split_weights .................... [['1']] [default0]: universal_checkpoint ............................ True [default0]: use_bnb_optimizer ............................... False [default0]: use_checkpoint_lr_scheduler ..................... False [default0]: use_contiguous_buffers_in_ddp ................... True [default0]: use_cpu_initialization .......................... None [default0]: use_one_sent_docs ............................... False [default0]: use_pin_memory .................................. False [default0]: valid_num_workers ............................... 2 [default0]: valid_weighted_split_names ...................... ['validation_pretraining', 'valid_ar', 'valid_ca', 'valid_code', 'valid_en', 'valid_es', 'valid_eu', 'valid_fr', 'valid_id', 'valid_indic-as', 'valid_indic-bn', 'valid_indic-gu', 'valid_indic-hi', 'valid_indic-kn', 'valid_indic-ml', 'valid_indic-mr', 'valid_indic-ne', 'valid_indic-or', 'valid_indic-pa', 'valid_indic-ta', 'valid_indic-te', 'valid_indic-ur', 'valid_nigercongo-all', 'valid_oscar-en', 'valid_oscar-zh', 'valid_pt', 'valid_vi', 'valid_zhs', 'valid_zht', 'valid'] [default0]: valid_weighted_split_paths ...................... [['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation']] [default0]: valid_weighted_split_paths_path ................. None [default0]: valid_weighted_split_splits ..................... [['0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0:1']] [default0]: valid_weighted_split_weights .................... [['0.0330676168743166', '0.011242051312222764', '0.13027200903379185', '0.22171164529099704', '0.10667815627928671', '0.0015595123898173287', '0.13054018439603915', '0.01091803753667153', '0.00011021422347108609', '0.005492381453597748', '0.0004021215011318779', '0.007470068593492175', '0.0006190467776576425', '0.0010335296343329384', '0.0005012010684646179', '0.0006672772956128299', '0.00035928138344705506', '0.0005084433130291778', '0.0021137328219915496', '0.0009129946225980253', '0.0012454301613725426', '0.00031588689199263235', '0.08137213783015229', '0.055293935695898196', '0.04954150576361177', '0.02461641286531197', '0.12091748245519074', '0.0005177025345001541'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1']] [default0]: virtual_pipeline_model_parallel_size ............ None [default0]: vocab_extra_ids ................................. 0 [default0]: vocab_file ...................................... None [default0]: weight_decay .................................... 0.0001 [default0]: world_size ...................................... 288 [default0]: zero_allgather_bucket_size ...................... 0.0 [default0]: zero_contigious_gradients ....................... False [default0]: zero_reduce_bucket_size ......................... 0.0 [default0]: zero_reduce_scatter ............................. False [default0]: zero_stage ...................................... 0 [default0]:-------------------- end of arguments --------------------- [default0]:setting number of micro-batches to constant 512 [default0]:> building PretrainedFromHF tokenizer ... [default0]: vocab file is un-used. loading tokenizer from pre-trained model [default0]:Offline mode: forcing local_files_only=True [default0]:Offline mode: forcing local_files_only=True [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer.json from cache at /gpfswork/rech/six/commun/models/29d0a41f4527257b8afe6d5495f492dac260318430f18239a42ca5f6dc4487fc.7b0fb8edc2986944ff9b7418149b52d8c4a1354a17d0360deb8974da70c6cc03 [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/added_tokens.json from cache at None [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/special_tokens_map.json from cache at /gpfswork/rech/six/commun/models/4f03e43bcc54e0721823e6a06b1d197905e2ea79aa7dcc1a0f0fcecc73ce3fb2.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer_config.json from cache at /gpfswork/rech/six/commun/models/9441c67b923ef7a65950a64e31c40f80ed181ba59502981a80f2cd0c438c6432.3c09887250243e50d8de9d10b2a778152434f62a22a95b5f89dbbe79a6eb496a [default7]:> setting tensorboard ... [default0]: > padded vocab (size: 250680) with 200 dummy tokens (new size: 250880) [default0]:DeepSpeed general environment info: [default0]:torch install path ............... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch'] [default0]:torch version .................... 1.12.0 [default0]:torch cuda version ............... 11.3 [default0]:torch hip version ................ None [default0]:nvcc version ..................... 11.4 [default0]:deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed'] [default0]:deepspeed info ................... 0.7.1+8b2a6371, 8b2a6371, master [default0]:deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3 [default0]:**** Git info for Megatron: git_hash=6c1018f git_branch=mtf-multival **** [default0]:> initializing torch distributed ... [default0]:[2022-09-03 18:48:36,890] [INFO] [comm.py:628:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [default0]:> initializing tensor model parallel with size 1 [default0]:> initializing pipeline model parallel with size 72 [default0]:> setting random seeds to 42 ... [default0]:[2022-09-03 18:48:43,510] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 [default0]:> compiling dataset index builder ... [default0]:make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:make: Nothing to be done for 'default'. [default0]:make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:>>> done with dataset index builder. Compilation time: 0.093 seconds [default0]:> compiling and loading fused kernels ... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module fused_mix_prec_layer_norm_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module fused_mix_prec_layer_norm_cuda... [default0]:>>> done with compiling and loading fused kernels. Compilation time: 7.036 seconds [default0]:time to initialize megatron (seconds): -13.348 [default0]:[after megatron is initialized] datetime: 2022-09-03 18:48:50 [default0]:building GPT model ... [default0]:[2022-09-03 18:48:50,684] [INFO] [utils.py:827:see_memory_usage] Before Building Model [default0]:[2022-09-03 18:48:50,684] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [default0]:[2022-09-03 18:48:50,685] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 35.99 GB, percent = 7.2% [default0]:SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None [default0]:Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=1, model=0): 5, ProcessCoord(pipe=1, data=2, model=0): 6, ProcessCoord(pipe=1, data=3, model=0): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=1, model=0): 9, ProcessCoord(pipe=2, data=2, model=0): 10, ProcessCoord(pipe=2, data=3, model=0): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=1, model=0): 13, ProcessCoord(pipe=3, data=2, model=0): 14, ProcessCoord(pipe=3, data=3, model=0): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=1, model=0): 17, ProcessCoord(pipe=4, data=2, model=0): 18, ProcessCoord(pipe=4, data=3, model=0): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=1, model=0): 21, ProcessCoord(pipe=5, data=2, model=0): 22, ProcessCoord(pipe=5, data=3, model=0): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=1, model=0): 25, ProcessCoord(pipe=6, data=2, model=0): 26, ProcessCoord(pipe=6, data=3, model=0): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=1, model=0): 29, ProcessCoord(pipe=7, data=2, model=0): 30, ProcessCoord(pipe=7, data=3, model=0): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=1, model=0): 33, ProcessCoord(pipe=8, data=2, model=0): 34, ProcessCoord(pipe=8, data=3, model=0): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=1, model=0): 37, ProcessCoord(pipe=9, data=2, model=0): 38, ProcessCoord(pipe=9, data=3, model=0): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=1, model=0): 41, ProcessCoord(pipe=10, data=2, model=0): 42, ProcessCoord(pipe=10, data=3, model=0): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=1, model=0): 45, ProcessCoord(pipe=11, data=2, model=0): 46, ProcessCoord(pipe=11, data=3, model=0): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=1, model=0): 49, ProcessCoord(pipe=12, data=2, model=0): 50, ProcessCoord(pipe=12, data=3, model=0): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=1, model=0): 53, ProcessCoord(pipe=13, data=2, model=0): 54, ProcessCoord(pipe=13, data=3, model=0): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=1, model=0): 57, ProcessCoord(pipe=14, data=2, model=0): 58, ProcessCoord(pipe=14, data=3, model=0): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=1, model=0): 61, ProcessCoord(pipe=15, data=2, model=0): 62, ProcessCoord(pipe=15, data=3, model=0): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=1, model=0): 65, ProcessCoord(pipe=16, data=2, model=0): 66, ProcessCoord(pipe=16, data=3, model=0): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=1, model=0): 69, ProcessCoord(pipe=17, data=2, model=0): 70, ProcessCoord(pipe=17, data=3, model=0): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=1, model=0): 73, ProcessCoord(pipe=18, data=2, model=0): 74, ProcessCoord(pipe=18, data=3, model=0): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=1, model=0): 77, ProcessCoord(pipe=19, data=2, model=0): 78, ProcessCoord(pipe=19, data=3, model=0): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=1, model=0): 81, ProcessCoord(pipe=20, data=2, model=0): 82, ProcessCoord(pipe=20, data=3, model=0): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=1, model=0): 85, ProcessCoord(pipe=21, data=2, model=0): 86, ProcessCoord(pipe=21, data=3, model=0): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=1, model=0): 89, ProcessCoord(pipe=22, data=2, model=0): 90, ProcessCoord(pipe=22, data=3, model=0): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=1, model=0): 93, ProcessCoord(pipe=23, data=2, model=0): 94, ProcessCoord(pipe=23, data=3, model=0): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=1, model=0): 97, ProcessCoord(pipe=24, data=2, model=0): 98, ProcessCoord(pipe=24, data=3, model=0): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=1, model=0): 101, ProcessCoord(pipe=25, data=2, model=0): 102, ProcessCoord(pipe=25, data=3, model=0): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=1, model=0): 105, ProcessCoord(pipe=26, data=2, model=0): 106, ProcessCoord(pipe=26, data=3, model=0): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=1, model=0): 109, ProcessCoord(pipe=27, data=2, model=0): 110, ProcessCoord(pipe=27, data=3, model=0): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=1, model=0): 113, ProcessCoord(pipe=28, data=2, model=0): 114, ProcessCoord(pipe=28, data=3, model=0): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=1, model=0): 117, ProcessCoord(pipe=29, data=2, model=0): 118, ProcessCoord(pipe=29, data=3, model=0): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=1, model=0): 121, ProcessCoord(pipe=30, data=2, model=0): 122, ProcessCoord(pipe=30, data=3, model=0): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=1, model=0): 125, ProcessCoord(pipe=31, data=2, model=0): 126, ProcessCoord(pipe=31, data=3, model=0): 127, ProcessCoord(pipe=32, data=0, model=0): 128, ProcessCoord(pipe=32, data=1, model=0): 129, ProcessCoord(pipe=32, data=2, model=0): 130, ProcessCoord(pipe=32, data=3, model=0): 131, ProcessCoord(pipe=33, data=0, model=0): 132, ProcessCoord(pipe=33, data=1, model=0): 133, ProcessCoord(pipe=33, data=2, model=0): 134, ProcessCoord(pipe=33, data=3, model=0): 135, ProcessCoord(pipe=34, data=0, model=0): 136, ProcessCoord(pipe=34, data=1, model=0): 137, ProcessCoord(pipe=34, data=2, model=0): 138, ProcessCoord(pipe=34, data=3, model=0): 139, ProcessCoord(pipe=35, data=0, model=0): 140, ProcessCoord(pipe=35, data=1, model=0): 141, ProcessCoord(pipe=35, data=2, model=0): 142, ProcessCoord(pipe=35, data=3, model=0): 143, ProcessCoord(pipe=36, data=0, model=0): 144, ProcessCoord(pipe=36, data=1, model=0): 145, ProcessCoord(pipe=36, data=2, model=0): 146, ProcessCoord(pipe=36, data=3, model=0): 147, ProcessCoord(pipe=37, data=0, model=0): 148, ProcessCoord(pipe=37, data=1, model=0): 149, ProcessCoord(pipe=37, data=2, model=0): 150, ProcessCoord(pipe=37, data=3, model=0): 151, ProcessCoord(pipe=38, data=0, model=0): 152, ProcessCoord(pipe=38, data=1, model=0): 153, ProcessCoord(pipe=38, data=2, model=0): 154, ProcessCoord(pipe=38, data=3, model=0): 155, ProcessCoord(pipe=39, data=0, model=0): 156, ProcessCoord(pipe=39, data=1, model=0): 157, ProcessCoord(pipe=39, data=2, model=0): 158, ProcessCoord(pipe=39, data=3, model=0): 159, ProcessCoord(pipe=40, data=0, model=0): 160, ProcessCoord(pipe=40, data=1, model=0): 161, ProcessCoord(pipe=40, data=2, model=0): 162, ProcessCoord(pipe=40, data=3, model=0): 163, ProcessCoord(pipe=41, data=0, model=0): 164, ProcessCoord(pipe=41, data=1, model=0): 165, ProcessCoord(pipe=41, data=2, model=0): 166, ProcessCoord(pipe=41, data=3, model=0): 167, ProcessCoord(pipe=42, data=0, model=0): 168, ProcessCoord(pipe=42, data=1, model=0): 169, ProcessCoord(pipe=42, data=2, model=0): 170, ProcessCoord(pipe=42, data=3, model=0): 171, ProcessCoord(pipe=43, data=0, model=0): 172, ProcessCoord(pipe=43, data=1, model=0): 173, ProcessCoord(pipe=43, data=2, model=0): 174, ProcessCoord(pipe=43, data=3, model=0): 175, ProcessCoord(pipe=44, data=0, model=0): 176, ProcessCoord(pipe=44, data=1, model=0): 177, ProcessCoord(pipe=44, data=2, model=0): 178, ProcessCoord(pipe=44, data=3, model=0): 179, ProcessCoord(pipe=45, data=0, model=0): 180, ProcessCoord(pipe=45, data=1, model=0): 181, ProcessCoord(pipe=45, data=2, model=0): 182, ProcessCoord(pipe=45, data=3, model=0): 183, ProcessCoord(pipe=46, data=0, model=0): 184, ProcessCoord(pipe=46, data=1, model=0): 185, ProcessCoord(pipe=46, data=2, model=0): 186, ProcessCoord(pipe=46, data=3, model=0): 187, ProcessCoord(pipe=47, data=0, model=0): 188, ProcessCoord(pipe=47, data=1, model=0): 189, ProcessCoord(pipe=47, data=2, model=0): 190, ProcessCoord(pipe=47, data=3, model=0): 191, ProcessCoord(pipe=48, data=0, model=0): 192, ProcessCoord(pipe=48, data=1, model=0): 193, ProcessCoord(pipe=48, data=2, model=0): 194, ProcessCoord(pipe=48, data=3, model=0): 195, ProcessCoord(pipe=49, data=0, model=0): 196, ProcessCoord(pipe=49, data=1, model=0): 197, ProcessCoord(pipe=49, data=2, model=0): 198, ProcessCoord(pipe=49, data=3, model=0): 199, ProcessCoord(pipe=50, data=0, model=0): 200, ProcessCoord(pipe=50, data=1, model=0): 201, ProcessCoord(pipe=50, data=2, model=0): 202, ProcessCoord(pipe=50, data=3, model=0): 203, ProcessCoord(pipe=51, data=0, model=0): 204, ProcessCoord(pipe=51, data=1, model=0): 205, ProcessCoord(pipe=51, data=2, model=0): 206, ProcessCoord(pipe=51, data=3, model=0): 207, ProcessCoord(pipe=52, data=0, model=0): 208, ProcessCoord(pipe=52, data=1, model=0): 209, ProcessCoord(pipe=52, data=2, model=0): 210, ProcessCoord(pipe=52, data=3, model=0): 211, ProcessCoord(pipe=53, data=0, model=0): 212, ProcessCoord(pipe=53, data=1, model=0): 213, ProcessCoord(pipe=53, data=2, model=0): 214, ProcessCoord(pipe=53, data=3, model=0): 215, ProcessCoord(pipe=54, data=0, model=0): 216, ProcessCoord(pipe=54, data=1, model=0): 217, ProcessCoord(pipe=54, data=2, model=0): 218, ProcessCoord(pipe=54, data=3, model=0): 219, ProcessCoord(pipe=55, data=0, model=0): 220, ProcessCoord(pipe=55, data=1, model=0): 221, ProcessCoord(pipe=55, data=2, model=0): 222, ProcessCoord(pipe=55, data=3, model=0): 223, ProcessCoord(pipe=56, data=0, model=0): 224, ProcessCoord(pipe=56, data=1, model=0): 225, ProcessCoord(pipe=56, data=2, model=0): 226, ProcessCoord(pipe=56, data=3, model=0): 227, ProcessCoord(pipe=57, data=0, model=0): 228, ProcessCoord(pipe=57, data=1, model=0): 229, ProcessCoord(pipe=57, data=2, model=0): 230, ProcessCoord(pipe=57, data=3, model=0): 231, ProcessCoord(pipe=58, data=0, model=0): 232, ProcessCoord(pipe=58, data=1, model=0): 233, ProcessCoord(pipe=58, data=2, model=0): 234, ProcessCoord(pipe=58, data=3, model=0): 235, ProcessCoord(pipe=59, data=0, model=0): 236, ProcessCoord(pipe=59, data=1, model=0): 237, ProcessCoord(pipe=59, data=2, model=0): 238, ProcessCoord(pipe=59, data=3, model=0): 239, ProcessCoord(pipe=60, data=0, model=0): 240, ProcessCoord(pipe=60, data=1, model=0): 241, ProcessCoord(pipe=60, data=2, model=0): 242, ProcessCoord(pipe=60, data=3, model=0): 243, ProcessCoord(pipe=61, data=0, model=0): 244, ProcessCoord(pipe=61, data=1, model=0): 245, ProcessCoord(pipe=61, data=2, model=0): 246, ProcessCoord(pipe=61, data=3, model=0): 247, ProcessCoord(pipe=62, data=0, model=0): 248, ProcessCoord(pipe=62, data=1, model=0): 249, ProcessCoord(pipe=62, data=2, model=0): 250, ProcessCoord(pipe=62, data=3, model=0): 251, ProcessCoord(pipe=63, data=0, model=0): 252, ProcessCoord(pipe=63, data=1, model=0): 253, ProcessCoord(pipe=63, data=2, model=0): 254, ProcessCoord(pipe=63, data=3, model=0): 255, ProcessCoord(pipe=64, data=0, model=0): 256, ProcessCoord(pipe=64, data=1, model=0): 257, ProcessCoord(pipe=64, data=2, model=0): 258, ProcessCoord(pipe=64, data=3, model=0): 259, ProcessCoord(pipe=65, data=0, model=0): 260, ProcessCoord(pipe=65, data=1, model=0): 261, ProcessCoord(pipe=65, data=2, model=0): 262, ProcessCoord(pipe=65, data=3, model=0): 263, ProcessCoord(pipe=66, data=0, model=0): 264, ProcessCoord(pipe=66, data=1, model=0): 265, ProcessCoord(pipe=66, data=2, model=0): 266, ProcessCoord(pipe=66, data=3, model=0): 267, ProcessCoord(pipe=67, data=0, model=0): 268, ProcessCoord(pipe=67, data=1, model=0): 269, ProcessCoord(pipe=67, data=2, model=0): 270, ProcessCoord(pipe=67, data=3, model=0): 271, ProcessCoord(pipe=68, data=0, model=0): 272, ProcessCoord(pipe=68, data=1, model=0): 273, ProcessCoord(pipe=68, data=2, model=0): 274, ProcessCoord(pipe=68, data=3, model=0): 275, ProcessCoord(pipe=69, data=0, model=0): 276, ProcessCoord(pipe=69, data=1, model=0): 277, ProcessCoord(pipe=69, data=2, model=0): 278, ProcessCoord(pipe=69, data=3, model=0): 279, ProcessCoord(pipe=70, data=0, model=0): 280, ProcessCoord(pipe=70, data=1, model=0): 281, ProcessCoord(pipe=70, data=2, model=0): 282, ProcessCoord(pipe=70, data=3, model=0): 283, ProcessCoord(pipe=71, data=0, model=0): 284, ProcessCoord(pipe=71, data=1, model=0): 285, ProcessCoord(pipe=71, data=2, model=0): 286, ProcessCoord(pipe=71, data=3, model=0): 287} [default0]:[2022-09-03 18:48:54,552] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer|embedding [default0]:stage=0 layers=3 [default0]: 0: _to_float16 [default0]: 1: EmbeddingPipe [default0]: 2: [default0]:stage=1 layers=1 [default0]: 3: ParallelTransformerLayerPipe [default0]:stage=2 layers=1 [default0]: 4: ParallelTransformerLayerPipe [default0]:stage=3 layers=1 [default0]: 5: ParallelTransformerLayerPipe [default0]:stage=4 layers=1 [default0]: 6: ParallelTransformerLayerPipe [default0]:stage=5 layers=1 [default0]: 7: ParallelTransformerLayerPipe [default0]:stage=6 layers=1 [default0]: 8: ParallelTransformerLayerPipe [default0]:stage=7 layers=1 [default0]: 9: ParallelTransformerLayerPipe [default0]:stage=8 layers=1 [default0]: 10: ParallelTransformerLayerPipe [default0]:stage=9 layers=1 [default0]: 11: ParallelTransformerLayerPipe [default0]:stage=10 layers=1 [default0]: 12: ParallelTransformerLayerPipe [default0]:stage=11 layers=1 [default0]: 13: ParallelTransformerLayerPipe [default0]:stage=12 layers=1 [default0]: 14: ParallelTransformerLayerPipe [default0]:stage=13 layers=1 [default0]: 15: ParallelTransformerLayerPipe [default0]:stage=14 layers=1 [default0]: 16: ParallelTransformerLayerPipe [default0]:stage=15 layers=1 [default0]: 17: ParallelTransformerLayerPipe [default0]:stage=16 layers=1 [default0]: 18: ParallelTransformerLayerPipe [default0]:stage=17 layers=1 [default0]: 19: ParallelTransformerLayerPipe [default0]:stage=18 layers=1 [default0]: 20: ParallelTransformerLayerPipe [default0]:stage=19 layers=1 [default0]: 21: ParallelTransformerLayerPipe [default0]:stage=20 layers=1 [default0]: 22: ParallelTransformerLayerPipe [default0]:stage=21 layers=1 [default0]: 23: ParallelTransformerLayerPipe [default0]:stage=22 layers=1 [default0]: 24: ParallelTransformerLayerPipe [default0]:stage=23 layers=1 [default0]: 25: ParallelTransformerLayerPipe [default0]:stage=24 layers=1 [default0]: 26: ParallelTransformerLayerPipe [default0]:stage=25 layers=1 [default0]: 27: ParallelTransformerLayerPipe [default0]:stage=26 layers=1 [default0]: 28: ParallelTransformerLayerPipe [default0]:stage=27 layers=1 [default0]: 29: ParallelTransformerLayerPipe [default0]:stage=28 layers=1 [default0]: 30: ParallelTransformerLayerPipe [default0]:stage=29 layers=1 [default0]: 31: ParallelTransformerLayerPipe [default0]:stage=30 layers=1 [default0]: 32: ParallelTransformerLayerPipe [default0]:stage=31 layers=1 [default0]: 33: ParallelTransformerLayerPipe [default0]:stage=32 layers=1 [default0]: 34: ParallelTransformerLayerPipe [default0]:stage=33 layers=1 [default0]: 35: ParallelTransformerLayerPipe [default0]:stage=34 layers=1 [default0]: 36: ParallelTransformerLayerPipe [default0]:stage=35 layers=1 [default0]: 37: ParallelTransformerLayerPipe [default0]:stage=36 layers=1 [default0]: 38: ParallelTransformerLayerPipe [default0]:stage=37 layers=1 [default0]: 39: ParallelTransformerLayerPipe [default0]:stage=38 layers=1 [default0]: 40: ParallelTransformerLayerPipe [default0]:stage=39 layers=1 [default0]: 41: ParallelTransformerLayerPipe [default0]:stage=40 layers=1 [default0]: 42: ParallelTransformerLayerPipe [default0]:stage=41 layers=1 [default0]: 43: ParallelTransformerLayerPipe [default0]:stage=42 layers=1 [default0]: 44: ParallelTransformerLayerPipe [default0]:stage=43 layers=1 [default0]: 45: ParallelTransformerLayerPipe [default0]:stage=44 layers=1 [default0]: 46: ParallelTransformerLayerPipe [default0]:stage=45 layers=1 [default0]: 47: ParallelTransformerLayerPipe [default0]:stage=46 layers=1 [default0]: 48: ParallelTransformerLayerPipe [default0]:stage=47 layers=1 [default0]: 49: ParallelTransformerLayerPipe [default0]:stage=48 layers=1 [default0]: 50: ParallelTransformerLayerPipe [default0]:stage=49 layers=1 [default0]: 51: ParallelTransformerLayerPipe [default0]:stage=50 layers=1 [default0]: 52: ParallelTransformerLayerPipe [default0]:stage=51 layers=1 [default0]: 53: ParallelTransformerLayerPipe [default0]:stage=52 layers=1 [default0]: 54: ParallelTransformerLayerPipe [default0]:stage=53 layers=1 [default0]: 55: ParallelTransformerLayerPipe [default0]:stage=54 layers=1 [default0]: 56: ParallelTransformerLayerPipe [default0]:stage=55 layers=1 [default0]: 57: ParallelTransformerLayerPipe [default0]:stage=56 layers=1 [default0]: 58: ParallelTransformerLayerPipe [default0]:stage=57 layers=1 [default0]: 59: ParallelTransformerLayerPipe [default0]:stage=58 layers=1 [default0]: 60: ParallelTransformerLayerPipe [default0]:stage=59 layers=1 [default0]: 61: ParallelTransformerLayerPipe [default0]:stage=60 layers=1 [default0]: 62: ParallelTransformerLayerPipe [default0]:stage=61 layers=1 [default0]: 63: ParallelTransformerLayerPipe [default0]:stage=62 layers=1 [default0]: 64: ParallelTransformerLayerPipe [default0]:stage=63 layers=1 [default0]: 65: ParallelTransformerLayerPipe [default0]:stage=64 layers=1 [default0]: 66: ParallelTransformerLayerPipe [default0]:stage=65 layers=1 [default0]: 67: ParallelTransformerLayerPipe [default0]:stage=66 layers=1 [default0]: 68: ParallelTransformerLayerPipe [default0]:stage=67 layers=1 [default0]: 69: ParallelTransformerLayerPipe [default0]:stage=68 layers=1 [default0]: 70: ParallelTransformerLayerPipe [default0]:stage=69 layers=1 [default0]: 71: ParallelTransformerLayerPipe [default0]:stage=70 layers=3 [default0]: 72: ParallelTransformerLayerPipe [default0]: 73: undo [default0]: 74: MixedFusedLayerNorm [default0]:stage=71 layers=2 [default0]: 75: EmbeddingPipe [default0]: 76: float16_to_fp32 [default0]: loss: CrossEntropy [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default2]:Building extension module utils... [default2]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3338339328765869 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.33382296562194824 seconds [default2]:ninja: no work to do. [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3339042663574219 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3338007926940918 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.39778971672058105 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3989591598510742 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.398388147354126 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3980100154876709 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.09192013740539551 seconds [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.09205126762390137 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.09187126159667969 seconds [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.09194827079772949 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-03 18:48:56,216] [INFO] [utils.py:827:see_memory_usage] After Building Model [default0]:[2022-09-03 18:48:56,217] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 18:48:56,217] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.38 GB, percent = 7.2% [default0]:setting training iterations to 3100 [default0]:> learning rate decay style: constant [default0]:DeepSpeed is enabled. [default0]:[2022-09-03 18:48:56,218] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.1+8b2a6371, git-hash=8b2a6371, git-branch=master [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.06499505043029785 seconds [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0649421215057373 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.06506085395812988 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.06510210037231445 seconds [default0]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default0]:Building extension module utils... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.386523962020874 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.339763879776001 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3400406837463379 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3399643898010254 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3785996437072754 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3355135917663574 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.33499693870544434 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3349893093109131 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.35889697074890137 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.35907411575317383 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.37868714332580566 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3786013126373291 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3352973461151123 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3179490566253662 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3254404067993164 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.32595252990722656 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3255300521850586 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.38705015182495117 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.33954763412475586 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.314133882522583 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.31430768966674805 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3578474521636963 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.4196460247039795 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.38728976249694824 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3252739906311035 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.31516528129577637 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3146226406097412 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3096039295196533 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3143429756164551 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3096139430999756 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.31424689292907715 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.31438636779785156 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.41947388648986816 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.313582181930542 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.34975504875183105 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3140590190887451 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.34932374954223633 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3143913745880127 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.31754541397094727 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3262314796447754 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3262481689453125 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.419525146484375 seconds [default5]:Loading extension module utils... [default4]:Loading extension module utils... [default7]:Loading extension module utils... [default4]:Time to load utils op: 0.3262195587158203 seconds [default5]:Time to load utils op: 0.3262314796447754 seconds [default7]:Time to load utils op: 0.4115419387817383 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.4115302562713623 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.41153550148010254 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3489491939544678 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.34899306297302246 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3789348602294922 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.4192991256713867 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.4115331172943115 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3171844482421875 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3873586654663086 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3872818946838379 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.4043409824371338 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3579113483428955 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30959129333496094 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3095991611480713 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.30957627296447754 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3096280097961426 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3549306392669678 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.32996249198913574 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.35613083839416504 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.4201390743255615 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3563680648803711 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.4201235771179199 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.31696295738220215 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.313586950302124 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3561842441558838 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3570406436920166 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.31404876708984375 seconds [default6]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.4201362133026123 seconds [default6]:Time to load utils op: 0.42013120651245117 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.309598445892334 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3095972537994385 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3286764621734619 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.37066006660461426 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.34320497512817383 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.34360337257385254 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3290901184082031 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.34314918518066406 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.34317827224731445 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.4255399703979492 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.44414591789245605 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3537101745605469 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00051116943359375 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.4445195198059082 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.4038505554199219 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.4246029853820801 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.4355301856994629 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.4355306625366211 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3878786563873291 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.38786792755126953 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3864569664001465 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.40424108505249023 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.425264835357666 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3302302360534668 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3303952217102051 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.4247603416442871 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3706636428833008 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.42472004890441895 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3585798740386963 seconds [default1]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.37065911293029785 seconds [default1]:Time to load utils op: 0.3706650733947754 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.33581066131591797 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3286457061767578 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3350982666015625 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3165414333343506 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3757030963897705 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3357973098754883 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.4445154666900635 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.31649041175842285 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3175485134124756 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3172428607940674 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.31420063972473145 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3145279884338379 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3350646495819092 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3291642665863037 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.35335803031921387 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3357884883880615 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.40431737899780273 seconds [default6]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.329559326171875 seconds [default6]:Time to load utils op: 0.3389852046966553 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.310408353805542 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3357880115509033 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.33899688720703125 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3560914993286133 seconds [default4]:Loading extension module utils... [default2]:Loading extension module utils... [default3]:Loading extension module utils... [default4]:Time to load utils op: 0.3390026092529297 seconds [default3]:Time to load utils op: 0.3296539783477783 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.4355299472808838 seconds [default2]:Time to load utils op: 0.3296821117401123 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3389914035797119 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.4245753288269043 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.31159377098083496 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.4245791435241699 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.355426549911499 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3138408660888672 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.336092472076416 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.39241576194763184 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3924126625061035 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3924140930175781 seconds [default7]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.39242100715637207 seconds [default7]:Time to load utils op: 0.33498549461364746 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3227710723876953 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3097963333129883 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.4124927520751953 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3416867256164551 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.41249656677246094 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3413534164428711 seconds [default5]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.35336756706237793 seconds [default5]:Time to load utils op: 0.4125175476074219 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.32984495162963867 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3350811004638672 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3116025924682617 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3103954792022705 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3103926181793213 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.31160402297973633 seconds [default4]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.35484910011291504 seconds [default4]:Time to load utils op: 0.35576295852661133 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3115806579589844 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3104236125946045 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3104121685028076 seconds [default2]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3304281234741211 seconds [default2]:Time to load utils op: 0.33036327362060547 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.310410737991333 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.31041550636291504 seconds [default1]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3412468433380127 seconds [default1]:Time to load utils op: 0.33053159713745117 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3115673065185547 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.34168195724487305 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3444795608520508 seconds [default4]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3310868740081787 seconds [default4]:Time to load utils op: 0.34441137313842773 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.33510804176330566 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3444962501525879 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3552207946777344 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3266723155975342 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3350942134857178 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3444528579711914 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.31159305572509766 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3115963935852051 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.38636350631713867 seconds [default7]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.32966160774230957 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.34119319915771484 seconds [default7]:Time to load utils op: 0.41251158714294434 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.35568881034851074 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.35553479194641113 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.31039953231811523 seconds [default1]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3258814811706543 seconds [default1]:Time to load utils op: 0.32529497146606445 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.33588361740112305 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3266599178314209 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.32230353355407715 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3694272041320801 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.4355344772338867 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3090672492980957 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.36979007720947266 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3140530586242676 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3694751262664795 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3706662654876709 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.32925844192504883 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3865318298339844 seconds [default6]:Loading extension module utils... [default5]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3138289451599121 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.35471081733703613 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3865342140197754 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.4245936870574951 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.314791202545166 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3093559741973877 seconds [default2]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3451509475708008 seconds [default6]:Loading extension module utils... [default5]:Time to load utils op: 0.3927123546600342 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3290274143218994 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.31157541275024414 seconds [default5]:Loading extension module utils... [default6]:Time to load utils op: 0.3710465431213379 seconds [default5]:Time to load utils op: 0.3708913326263428 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.37670373916625977 seconds [default6]:Time to load utils op: 0.3927171230316162 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3926866054534912 seconds [default2]:Time to load utils op: 0.4074258804321289 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.329028844833374 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.37044239044189453 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3761327266693115 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.39272236824035645 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.40782594680786133 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.325176477432251 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3251051902770996 seconds [default2]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3710193634033203 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3756747245788574 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.34113645553588867 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3452177047729492 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.4074256420135498 seconds [default2]:Time to load utils op: 0.30908823013305664 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3465571403503418 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.34569740295410156 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.40773463249206543 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3354473114013672 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3358297348022461 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3138120174407959 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3290233612060547 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.329007625579834 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.38657379150390625 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.33277368545532227 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.330080509185791 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3351116180419922 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3350484371185303 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.35805583000183105 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.33508729934692383 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3584580421447754 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.33509111404418945 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.40350770950317383 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.4035210609436035 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3510575294494629 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.4035353660583496 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3579387664794922 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3220512866973877 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.350283145904541 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3346381187438965 seconds [default6]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.321819543838501 seconds [default6]:Time to load utils op: 0.32665348052978516 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3266627788543701 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.35035109519958496 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.35041213035583496 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.40351080894470215 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3315083980560303 seconds [default7]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3137035369873047 seconds [default7]:Time to load utils op: 0.3355576992034912 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3360168933868408 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.33187437057495117 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.33631181716918945 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3292996883392334 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3865528106689453 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.33165669441223145 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.329495906829834 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0003695487976074219 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005974769592285156 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004930496215820312 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005433559417724609 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004582405090332031 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004558563232421875 seconds [default7]:Time to load utils op: 0.0005297660827636719 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005488395690917969 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007317066192626953 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005981922149658203 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006282329559326172 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007305145263671875 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0009050369262695312 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006239414215087891 seconds [default2]:Time to load utils op: 0.0005598068237304688 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007801055908203125 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007853507995605469 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007054805755615234 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006918907165527344 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006020069122314453 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005898475646972656 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004820823669433594 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0010197162628173828 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.001125335693359375 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005793571472167969 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0012335777282714844 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0010411739349365234 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005750656127929688 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00044536590576171875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0011057853698730469 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008540153503417969 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007066726684570312 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.000850677490234375 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0009143352508544922 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008966922760009766 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007004737854003906 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00044536590576171875 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005335807800292969 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006487369537353516 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.000453948974609375 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008535385131835938 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006046295166015625 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006854534149169922 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0010650157928466797 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.001051187515258789 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007359981536865234 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006167888641357422 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005750656127929688 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006225109100341797 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007429122924804688 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005993843078613281 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Time to load utils op: 0.0005395412445068359 seconds [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default7]:Time to load utils op: 0.0006124973297119141 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009338855743408203 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005753040313720703 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default6]:Time to load utils op: 0.0009675025939941406 seconds [default2]:Time to load utils op: 0.0007688999176025391 seconds [default4]:Time to load utils op: 0.0010538101196289062 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0009000301361083984 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0010755062103271484 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005509853363037109 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008480548858642578 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007488727569580078 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006380081176757812 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008449554443359375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006029605865478516 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005388259887695312 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006084442138671875 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005698204040527344 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005500316619873047 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0012562274932861328 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007548332214355469 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005669593811035156 seconds [default5]:Loading extension module utils... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005106925964355469 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006422996520996094 seconds [default5]:Time to load utils op: 0.0005812644958496094 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004754066467285156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005395412445068359 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007097721099853516 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005030632019042969 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00080108642578125 seconds [default7]:Time to load utils op: 0.0006508827209472656 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005793571472167969 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005578994750976562 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0009474754333496094 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005919933319091797 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0012595653533935547 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007846355438232422 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007550716400146484 seconds [default7]:Time to load utils op: 0.0006237030029296875 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0010488033294677734 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008208751678466797 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006487369537353516 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005166530609130859 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006756782531738281 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0010597705841064453 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0009067058563232422 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008938312530517578 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0011222362518310547 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006115436553955078 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006191730499267578 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0009260177612304688 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004630088806152344 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005958080291748047 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006816387176513672 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006239414215087891 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007417201995849609 seconds [default3]:Time to load utils op: 0.0007152557373046875 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007750988006591797 seconds [default1]:Time to load utils op: 0.0006837844848632812 seconds [default2]:Time to load utils op: 0.0006809234619140625 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0010576248168945312 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004258155822753906 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.000583648681640625 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005578994750976562 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008032321929931641 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0009050369262695312 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006144046783447266 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005707740783691406 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008871555328369141 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006518363952636719 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005195140838623047 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005004405975341797 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005557537078857422 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006225109100341797 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005831718444824219 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006194114685058594 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0009696483612060547 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007367134094238281 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005536079406738281 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006663799285888672 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005137920379638672 seconds [default4]:Time to load utils op: 0.0006175041198730469 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0009148120880126953 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007603168487548828 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.000579833984375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004165172576904297 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006091594696044922 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008418560028076172 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006923675537109375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004949569702148438 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0013470649719238281 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006377696990966797 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default2]:Time to load utils op: 0.0006551742553710938 seconds [default0]:Time to load utils op: 0.0005567073822021484 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008628368377685547 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0013573169708251953 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005464553833007812 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006320476531982422 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004894733428955078 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007729530334472656 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Time to load utils op: 0.0005855560302734375 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007445812225341797 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007529258728027344 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005035400390625 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.000652313232421875 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005595684051513672 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005960464477539062 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009055137634277344 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007121562957763672 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007128715515136719 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005919933319091797 seconds [default2]:Time to load utils op: 0.0008609294891357422 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008044242858886719 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006947517395019531 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007450580596923828 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00046706199645996094 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0009045600891113281 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0009126663208007812 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.000598907470703125 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007519721984863281 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default1]:Time to load utils op: 0.00036454200744628906 seconds [default2]:Time to load utils op: 0.0004913806915283203 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007419586181640625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00041985511779785156 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006620883941650391 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00043511390686035156 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004222393035888672 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006687641143798828 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004379749298095703 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006132125854492188 seconds [default5]:Time to load utils op: 0.0004246234893798828 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0009455680847167969 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009114742279052734 seconds [default3]:Time to load utils op: 0.0008347034454345703 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005924701690673828 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007834434509277344 seconds [default2]:Time to load utils op: 0.0009202957153320312 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005812644958496094 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006642341613769531 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005085468292236328 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005900859832763672 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Time to load utils op: 0.0007383823394775391 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006327629089355469 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006039142608642578 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005881786346435547 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007240772247314453 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0010833740234375 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006244182586669922 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009119510650634766 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0010151863098144531 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008220672607421875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009720325469970703 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006451606750488281 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006973743438720703 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.000576019287109375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007698535919189453 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Time to load utils op: 0.0007755756378173828 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007882118225097656 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005483627319335938 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0010242462158203125 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006833076477050781 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006766319274902344 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006153583526611328 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005793571472167969 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006306171417236328 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005521774291992188 seconds [default7]:Time to load utils op: 0.0009145736694335938 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Time to load utils op: 0.0006349086761474609 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0010814666748046875 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0011472702026367188 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.001806497573852539 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Time to load utils op: 0.0005884170532226562 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007472038269042969 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0016336441040039062 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Time to load utils op: 0.0015878677368164062 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0009529590606689453 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0015282630920410156 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0017445087432861328 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Time to load utils op: 0.0006556510925292969 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0018258094787597656 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008678436279296875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009419918060302734 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.002004384994506836 seconds [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0017001628875732422 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005407333374023438 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008449554443359375 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006530284881591797 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005538463592529297 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006570816040039062 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006866455078125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007038116455078125 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006933212280273438 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005862712860107422 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006074905395507812 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006301403045654297 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005552768707275391 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008134841918945312 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005207061767578125 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005078315734863281 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006039142608642578 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005075931549072266 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005464553833007812 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006318092346191406 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005297660827636719 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006291866302490234 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008847713470458984 seconds [default6]:Time to load utils op: 0.0009250640869140625 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006473064422607422 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005893707275390625 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005965232849121094 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006880760192871094 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007929801940917969 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007088184356689453 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005619525909423828 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007483959197998047 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006213188171386719 seconds [default5]:Time to load utils op: 0.0004363059997558594 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006232261657714844 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007264614105224609 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006649494171142578 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-03 18:48:56,937] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [default0]:[2022-09-03 18:48:56,938] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer [default0]:[2022-09-03 18:48:56,938] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer [default0]:[2022-09-03 18:48:56,938] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = {basic_optimizer.__class__.__name__} [default0]:[2022-09-03 18:48:56,938] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer [default0]:[2022-09-03 18:48:56,981] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer [default0]:[2022-09-03 18:48:56,982] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 18:48:56,982] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.54 GB, percent = 7.3% [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default4]:Building extension module utils... [default4]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.30474019050598145 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3048586845397949 seconds [default4]:ninja: no work to do. [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.22147488594055176 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00046896934509277344 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2085716724395752 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2023012638092041 seconds [default0]:[2022-09-03 18:48:57,213] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 [default0]:[2022-09-03 18:48:57,214] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 18:48:57,214] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.54 GB, percent = 7.3% [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3044891357421875 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20861363410949707 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20845246315002441 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0019183158874511719 seconds [default0]:[2022-09-03 18:48:57,284] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 [default0]:[2022-09-03 18:48:57,285] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 18:48:57,285] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.54 GB, percent = 7.3% [default0]:[2022-09-03 18:48:57,312] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 [default0]:[2022-09-03 18:48:57,312] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 18:48:57,312] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.54 GB, percent = 7.3% [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00044918060302734375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.001939535140991211 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.002056121826171875 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00041675567626953125 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00032639503479003906 seconds [default0]:[2022-09-03 18:48:57,340] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 [default0]:[2022-09-03 18:48:57,340] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 18:48:57,340] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.54 GB, percent = 7.3% [default0]:[2022-09-03 18:48:57,367] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer [default0]:[2022-09-03 18:48:57,368] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 18:48:57,368] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.54 GB, percent = 7.3% [default0]:[2022-09-03 18:48:57,444] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer [default0]:[2022-09-03 18:48:57,444] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-03 18:48:57,444] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.54 GB, percent = 7.3% [default0]:[2022-09-03 18:48:57,471] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer [default0]:[2022-09-03 18:48:57,472] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-03 18:48:57,472] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.54 GB, percent = 7.3% [default0]:[2022-09-03 18:48:57,472] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [default0]:[2022-09-03 18:48:57,472] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler [default0]:[2022-09-03 18:48:57,472] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [default0]:[2022-09-03 18:48:57,472] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05, 2e-05], mom=[(0.9, 0.95), (0.9, 0.95)] [default0]:[2022-09-03 18:48:57,472] [INFO] [config.py:987:print] DeepSpeedEngine configuration: [default0]:[2022-09-03 18:48:57,472] [INFO] [config.py:991:print] activation_checkpointing_config { [default0]: "partition_activations": false, [default0]: "contiguous_memory_optimization": false, [default0]: "cpu_checkpointing": false, [default0]: "number_checkpoints": null, [default0]: "synchronize_checkpoint_boundary": false, [default0]: "profile": false [default0]:} [default0]:[2022-09-03 18:48:57,472] [INFO] [config.py:991:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [default0]:[2022-09-03 18:48:57,472] [INFO] [config.py:991:print] amp_enabled .................. False [default0]:[2022-09-03 18:48:57,472] [INFO] [config.py:991:print] amp_params ................... False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] autotuning_config ............ { [default0]: "enabled": false, [default0]: "start_step": null, [default0]: "end_step": null, [default0]: "metric_path": null, [default0]: "arg_mappings": null, [default0]: "metric": "throughput", [default0]: "model_info": null, [default0]: "results_dir": null, [default0]: "exps_dir": null, [default0]: "overwrite": true, [default0]: "fast": true, [default0]: "start_profile_step": 3, [default0]: "end_profile_step": 5, [default0]: "tuner_type": "gridsearch", [default0]: "tuner_early_stopping": 5, [default0]: "tuner_num_trials": 50, [default0]: "model_info_path": null, [default0]: "mp_size": 1, [default0]: "max_train_batch_size": null, [default0]: "min_train_batch_size": 1, [default0]: "max_train_micro_batch_size_per_gpu": 1.024000e+03, [default0]: "min_train_micro_batch_size_per_gpu": 1, [default0]: "num_tuning_micro_batch_sizes": 3 [default0]:} [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] bfloat16_enabled ............. True [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] checkpoint_tag_validation_enabled True [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] checkpoint_tag_validation_fail False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] comms_config ................. [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] communication_data_type ...... None [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] curriculum_enabled ........... False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] curriculum_params ............ False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] dataloader_drop_last ......... False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] disable_allgather ............ False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] dump_state ................... False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] dynamic_loss_scale_args ...... None [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] eigenvalue_enabled ........... False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] eigenvalue_gas_boundary_resolution 1 [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] eigenvalue_layer_name ........ bert.encoder.layer [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] eigenvalue_layer_num ......... 0 [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] eigenvalue_max_iter .......... 100 [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] eigenvalue_stability ......... 1e-06 [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] eigenvalue_tol ............... 0.01 [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] eigenvalue_verbose ........... False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] elasticity_enabled ........... False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] flops_profiler_config ........ { [default0]: "enabled": false, [default0]: "profile_step": 1, [default0]: "module_depth": -1, [default0]: "top_modules": 1, [default0]: "detailed": true, [default0]: "output_file": null [default0]:} [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] fp16_auto_cast ............... None [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] fp16_enabled ................. False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] fp16_master_weights_and_gradients False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] global_rank .................. 0 [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] gradient_accumulation_steps .. 512 [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] gradient_clipping ............ 1.0 [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] gradient_predivide_factor .... 1.0 [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] initial_dynamic_scale ........ 1 [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] load_universal_checkpoint .... True [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] loss_scale ................... 1.0 [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] memory_breakdown ............. False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] monitor_config ............... [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] nebula_config ................ { [default0]: "enabled": false, [default0]: "persistent_storage_path": null, [default0]: "persistent_time_interval": 100, [default0]: "num_of_version_in_retention": 2, [default0]: "enable_nebula_load": true, [default0]: "load_path": null [default0]:} [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] optimizer_legacy_fusion ...... False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] optimizer_name ............... None [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] optimizer_params ............. None [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] pld_enabled .................. False [default0]:[2022-09-03 18:48:57,473] [INFO] [config.py:991:print] pld_params ................... False [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:991:print] prescale_gradients ........... False [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:991:print] scheduler_name ............... None [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:991:print] scheduler_params ............. None [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:991:print] sparse_attention ............. None [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:991:print] sparse_gradients_enabled ..... False [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:991:print] steps_per_print .............. 2000 [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:991:print] train_batch_size ............. 2048 [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:991:print] train_micro_batch_size_per_gpu 1 [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:991:print] wall_clock_breakdown ......... False [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:991:print] world_size ................... 4 [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:991:print] zero_allow_untested_optimizer False [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:991:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:991:print] zero_enabled ................. False [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:991:print] zero_optimization_stage ...... 0 [default0]:[2022-09-03 18:48:57,474] [INFO] [config.py:976:print_user_config] json = { [default0]: "train_micro_batch_size_per_gpu": 1, [default0]: "train_batch_size": 2.048000e+03, [default0]: "gradient_clipping": 1.0, [default0]: "zero_optimization": { [default0]: "stage": 0 [default0]: }, [default0]: "bf16": { [default0]: "enabled": true [default0]: }, [default0]: "steps_per_print": 2.000000e+03, [default0]: "wall_clock_breakdown": false, [default0]: "checkpoint": { [default0]: "load_universal": true [default0]: } [default0]:} [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005314350128173828 seconds [default0]:[2022-09-03 18:48:57,474] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=512 micro_batch_size=1 [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=24 STAGE=6 LAYERS=1 [8, 9) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=208 STAGE=52 LAYERS=1 [54, 55) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=168 STAGE=42 LAYERS=1 [44, 45) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=28 STAGE=7 LAYERS=1 [9, 10) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=212 STAGE=53 LAYERS=1 [55, 56) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=248 STAGE=62 LAYERS=1 [64, 65) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=112 STAGE=28 LAYERS=1 [30, 31) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=252 STAGE=63 LAYERS=1 [65, 66) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=184 STAGE=46 LAYERS=1 [48, 49) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=128 STAGE=32 LAYERS=1 [34, 35) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=116 STAGE=29 LAYERS=1 [31, 32) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=104 STAGE=26 LAYERS=1 [28, 29) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=188 STAGE=47 LAYERS=1 [49, 50) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=108 STAGE=27 LAYERS=1 [29, 30) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=172 STAGE=43 LAYERS=1 [45, 46) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=72 STAGE=18 LAYERS=1 [20, 21) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=132 STAGE=33 LAYERS=1 [35, 36) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=144 STAGE=36 LAYERS=1 [38, 39) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=176 STAGE=44 LAYERS=1 [46, 47) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=68 STAGE=17 LAYERS=1 [19, 20) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=64 STAGE=16 LAYERS=1 [18, 19) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=96 STAGE=24 LAYERS=1 [26, 27) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=180 STAGE=45 LAYERS=1 [47, 48) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=192 STAGE=48 LAYERS=1 [50, 51) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=224 STAGE=56 LAYERS=1 [58, 59) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=272 STAGE=68 LAYERS=1 [70, 71) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=80 STAGE=20 LAYERS=1 [22, 23) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=40 STAGE=10 LAYERS=1 [12, 13) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=280 STAGE=70 LAYERS=3 [72, 75) STAGE_PARAMS=2466465792 (2466.466M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=8 STAGE=2 LAYERS=1 [4, 5) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=196 STAGE=49 LAYERS=1 [51, 52) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=100 STAGE=25 LAYERS=1 [27, 28) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=268 STAGE=67 LAYERS=1 [69, 70) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=156 STAGE=39 LAYERS=1 [41, 42) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=16 STAGE=4 LAYERS=1 [6, 7) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=264 STAGE=66 LAYERS=1 [68, 69) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=284 STAGE=71 LAYERS=2 [75, 77) STAGE_PARAMS=3596615680 (3596.616M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=88 STAGE=22 LAYERS=1 [24, 25) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=44 STAGE=11 LAYERS=1 [13, 14) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=48 STAGE=12 LAYERS=1 [14, 15) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=12 STAGE=3 LAYERS=1 [5, 6) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=60 STAGE=15 LAYERS=1 [17, 18) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=56 STAGE=14 LAYERS=1 [16, 17) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=148 STAGE=37 LAYERS=1 [39, 40) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=136 STAGE=34 LAYERS=1 [36, 37) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=216 STAGE=54 LAYERS=1 [56, 57) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=52 STAGE=13 LAYERS=1 [15, 16) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=92 STAGE=23 LAYERS=1 [25, 26) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=76 STAGE=19 LAYERS=1 [21, 22) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=232 STAGE=58 LAYERS=1 [60, 61) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=32 STAGE=8 LAYERS=1 [10, 11) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=36 STAGE=9 LAYERS=1 [11, 12) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=220 STAGE=55 LAYERS=1 [57, 58) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=256 STAGE=64 LAYERS=1 [66, 67) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=276 STAGE=69 LAYERS=1 [71, 72) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=164 STAGE=41 LAYERS=1 [43, 44) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=260 STAGE=65 LAYERS=1 [67, 68) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=236 STAGE=59 LAYERS=1 [61, 62) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=228 STAGE=57 LAYERS=1 [59, 60) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=204 STAGE=51 LAYERS=1 [53, 54) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=200 STAGE=50 LAYERS=1 [52, 53) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=140 STAGE=35 LAYERS=1 [37, 38) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=124 STAGE=31 LAYERS=1 [33, 34) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=84 STAGE=21 LAYERS=1 [23, 24) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=160 STAGE=40 LAYERS=1 [42, 43) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=120 STAGE=30 LAYERS=1 [32, 33) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=152 STAGE=38 LAYERS=1 [40, 41) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=244 STAGE=61 LAYERS=1 [63, 64) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=240 STAGE=60 LAYERS=1 [62, 63) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,091] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=3 [0, 3) STAGE_PARAMS=3596644352 (3596.644M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=20 STAGE=5 LAYERS=1 [7, 8) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:48:58,092] [INFO] [engine.py:145:__init__] RANK=4 STAGE=1 LAYERS=1 [3, 4) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/9.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/8.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/44.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/48.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/48.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/49.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default3]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default7]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]:[2022-09-03 18:48:58,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: main() [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: self._load_universal_checkpoint(checkpoint_folder, [default1]: pretrain( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/8.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/8.self_attention.dense.weight/fp32.pt is not a valid file [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/8.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/45.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default4]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: success = self._load_zero_checkpoint( [default3]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default4]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default6]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/54.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default7]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default5]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default2]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/54.self_attention.dense.weight/fp32.pt is not a valid file [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/54.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/28.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default6]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]:Traceback (most recent call last): [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/28.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/28.self_attention.dense.weight/fp32.pt is not a valid file [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/29.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/28.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:59,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:59,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:59,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:59,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: self._load_universal_checkpoint(checkpoint_folder, [default1]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:Traceback (most recent call last): [default5]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]: lp.load_hp_checkpoint_state( [default5]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/30.self_attention.dense.weight/fp32.pt is not a valid file [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]:Traceback (most recent call last): [default7]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/34.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/46.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/46.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]:[2022-09-03 18:48:58,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/34.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: return f(*args, **kwargs) [default1]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default3]: self.optimizer.load_state_dict( [default1]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: pretrain( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/34.self_attention.dense.weight/fp32.pt is not a valid file [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/35.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/34.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:59,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/72.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/72.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:59,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/46.self_attention.dense.weight/fp32.pt is not a valid file [default4]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default6]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/50.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default0]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default1]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/50.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: pretrain( [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/47.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/46.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:[2022-09-03 18:48:59,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/68.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: pretrain( [default7]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/72.self_attention.dense.weight/fp32.pt is not a valid file [default6]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/68.self_attention.dense.weight/fp32.pt is not a valid file [default7]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/68.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/69.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:59,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/68.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:59,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:59,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: main() [default0]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default6]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: pretrain( [default3]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:Traceback (most recent call last): [default3]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: main() [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default5]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: success = self._load_zero_checkpoint( [default5]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]:[2022-09-03 18:48:59,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/12.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default6]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default6]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default6]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/24.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/25.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:[2022-09-03 18:48:59,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default3]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/24.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:59,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:Traceback (most recent call last): [default1]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:Traceback (most recent call last): [default7]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default5]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: pretrain( [default5]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]:Traceback (most recent call last): [default1]:Traceback (most recent call last): [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default6]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:Traceback (most recent call last): [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default7]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default7]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]:[2022-09-03 18:48:59,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:59,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/24.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/24.self_attention.dense.weight/fp32.pt is not a valid file [default2]: pretrain( [default6]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: success = self._load_zero_checkpoint( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default3]: pretrain( [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/39.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/38.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/38.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/38.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/38.self_attention.dense.weight/fp32.pt is not a valid file [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/40.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]:[2022-09-03 18:48:59,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:59,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:59,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/36.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:59,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:59,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:59,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:59,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:59,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:Traceback (most recent call last): [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default5]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:59,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default1]: main() [default2]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:Traceback (most recent call last): [default3]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default3]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: main() [default6]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]: main() [default2]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default7]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: self.optimizer.load_state_dict( [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: pretrain( [default2]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default5]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/36.self_attention.dense.weight/fp32.pt is not a valid file [default5]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:Traceback (most recent call last): [default7]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/36.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: pretrain( [default1]:Traceback (most recent call last): [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: main() [default7]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default5]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: pretrain( [default4]:Traceback (most recent call last): [default2]: self.optimizer.load_state_dict( [default0]:Traceback (most recent call last): [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default3]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: main() [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: return f(*args, **kwargs) [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/61.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/37.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default0]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: pretrain( [default6]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default6]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]:Traceback (most recent call last): [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default5]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: main() [default5]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:Traceback (most recent call last): [default5]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: pretrain( [default4]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: main() [default7]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:Traceback (most recent call last): [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default3]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default2]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]:[2022-09-03 18:48:59,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:59,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: main() [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: return f(*args, **kwargs) [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self._load_universal_checkpoint(checkpoint_folder, [default1]: success = self._load_zero_checkpoint( [default4]: self.optimizer.load_state_dict( [default0]:[2022-09-03 18:48:59,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:59,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:59,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:59,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default7]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: self.optimizer.load_state_dict( [default7]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default7]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: success = self._load_zero_checkpoint( [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: main() [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default0]: pretrain( [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: self.optimizer.load_state_dict( [default6]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default0]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/36.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default5]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default4]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]: success = self._load_zero_checkpoint( [default4]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/29.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default5]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: success = self._load_zero_checkpoint( [default0]: self._load_universal_checkpoint(checkpoint_folder, [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/52.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/59.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: self.optimizer.load_state_dict( [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default3]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default6]:[2022-09-03 18:48:58,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default6]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/60.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/58.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self.optimizer.load_state_dict( [default4]: self.optimizer.load_state_dict( [default0]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default7]: self._load_universal_checkpoint(checkpoint_folder, [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/52.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/58.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/29.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/29.self_attention.dense.weight/fp32.pt is not a valid file [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self.optimizer.load_state_dict( [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/58.self_attention.dense.weight/fp32.pt is not a valid file [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default2]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/52.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/53.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/60.self_attention.dense.weight/fp32.pt is not a valid file [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/60.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/52.self_attention.dense.weight/fp32.pt is not a valid file [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/60.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default1]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: pretrain( [default0]: main() [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: pretrain( [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: return f(*args, **kwargs) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: success = self._load_zero_checkpoint( [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: return f(*args, **kwargs) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/32.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/58.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default0]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/40.self_attention.dense.weight/fp32.pt is not a valid file [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default1]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/22.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: main() [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/41.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/22.self_attention.dense.weight/fp32.pt is not a valid file [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/40.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/23.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/22.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/22.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/20.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/40.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/50.self_attention.dense.weight/fp32.pt is not a valid file [default2]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/9.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: success = self._load_zero_checkpoint( [default0]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/32.self_attention.dense.weight/fp32.pt is not a valid file [default5]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/9.self_attention.dense.weight/fp32.pt is not a valid file [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/66.self_attention.dense.weight/fp32.pt is not a valid file [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/66.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/66.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/66.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/67.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/9.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/26.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/32.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]:Traceback (most recent call last): [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/44.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default3]:[2022-09-03 18:48:58,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/44.self_attention.dense.weight/fp32.pt is not a valid file [default6]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default6]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/56.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default2]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/44.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:Traceback (most recent call last): [default2]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/56.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: return f(*args, **kwargs) [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/57.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/45.self_attention.dense.weight/fp32.pt is not a valid file [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/45.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/45.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/25.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/25.self_attention.dense.weight/fp32.pt is not a valid file [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/21.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/25.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/62.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/63.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default7]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/62.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:59,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/62.self_attention.dense.weight/fp32.pt is not a valid file [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/62.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/27.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/26.self_attention.dense.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]:Traceback (most recent call last): [default6]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default5]: pretrain( [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: success = self._load_zero_checkpoint( [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: self._load_universal_checkpoint(checkpoint_folder, [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/35.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]:Traceback (most recent call last): [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/35.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default0]: main() [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/35.self_attention.dense.weight/fp32.pt is not a valid file [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]:Traceback (most recent call last): [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default7]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: success = self._load_zero_checkpoint( [default2]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:Traceback (most recent call last): [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default2]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: pretrain( [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/65.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/65.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/65.self_attention.dense.weight/fp32.pt is not a valid file [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: lp.load_hp_checkpoint_state( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/6.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default5]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/7.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/6.self_attention.dense.weight/fp32.pt is not a valid file [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/6.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/6.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/48.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/48.self_attention.dense.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/49.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/49.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]:[2022-09-03 18:48:58,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/72.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:Traceback (most recent call last): [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/20.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/20.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/54.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/47.self_attention.dense.weight/fp32.pt is not a valid file [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/55.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/47.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/55.self_attention.dense.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/55.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/55.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/5.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/5.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: pretrain( [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/30.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/31.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/30.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/30.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/31.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/49.self_attention.dense.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/31.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/31.self_attention.dense.weight/fp32.pt is not a valid file [default3]:Traceback (most recent call last): [default3]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:Traceback (most recent call last): [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:Traceback (most recent call last): [default3]: main() [default4]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default2]:[2022-09-03 18:48:58,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: main() [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/13.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default3]: self._load_universal_checkpoint(checkpoint_folder, [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]:[2022-09-03 18:48:58,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/12.self_attention.dense.weight/fp32.pt is not a valid file [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: main() [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default2]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]: self.optimizer.load_state_dict( [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/12.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: return f(*args, **kwargs) [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/12.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: pretrain( [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: return f(*args, **kwargs) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: self.optimizer.load_state_dict( [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/13.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/13.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]:Traceback (most recent call last): [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default6]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]:Traceback (most recent call last): [default6]:Traceback (most recent call last): [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/43.self_attention.dense.weight/fp32.pt is not a valid file [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/43.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/19.self_attention.dense.weight/fp32.pt is not a valid file [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: success = self._load_zero_checkpoint( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/19.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: self.optimizer.load_state_dict( [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: main() [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: lp.load_hp_checkpoint_state( [default7]:[2022-09-03 18:48:58,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/19.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]: self.optimizer.load_state_dict( [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/14.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: lp.load_hp_checkpoint_state( [default6]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/14.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: success = self._load_zero_checkpoint( [default3]: success = self._load_zero_checkpoint( [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: lp.load_hp_checkpoint_state( [default5]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: self._load_universal_checkpoint(checkpoint_folder, [default5]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default7]:[2022-09-03 18:48:58,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/15.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/3.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/14.self_attention.dense.weight/fp32.pt is not a valid file [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: lp.load_hp_checkpoint_state( [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/3.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/15.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/59.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/59.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/59.self_attention.dense.weight/fp32.pt is not a valid file [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default7]: self._load_universal_checkpoint(checkpoint_folder, [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/3.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/3.self_attention.dense.weight/fp32.pt is not a valid file [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/51.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/50.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/51.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/51.self_attention.dense.weight/fp32.pt is not a valid file [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/51.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/47.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/15.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/69.self_attention.dense.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/15.self_attention.dense.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: pretrain( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/21.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/20.self_attention.dense.weight/fp32.pt is not a valid file [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/17.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/17.self_attention.dense.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/21.self_attention.dense.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/17.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]: lp.load_hp_checkpoint_state( [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/7.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/7.self_attention.dense.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: success = self._load_zero_checkpoint( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/7.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/53.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/53.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/61.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default7]:[2022-09-03 18:48:58,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/27.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/26.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/26.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/27.self_attention.dense.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/27.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/61.self_attention.dense.weight/fp32.pt is not a valid file [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/61.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/41.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/11.self_attention.dense.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/11.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/11.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/67.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/69.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/69.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/41.self_attention.dense.weight/fp32.pt is not a valid file [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/63.self_attention.dense.weight/fp32.pt is not a valid file [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/63.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/63.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/56.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/56.self_attention.dense.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: return f(*args, **kwargs) [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]:Traceback (most recent call last): [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: self.optimizer.load_state_dict( [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]:Traceback (most recent call last): [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/57.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/57.self_attention.dense.weight/fp32.pt is not a valid file [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/67.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/67.self_attention.dense.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/39.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/39.self_attention.dense.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/53.self_attention.dense.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/13.self_attention.dense.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/71.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: success = self._load_zero_checkpoint( [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/23.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/23.self_attention.dense.weight/fp32.pt is not a valid file [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/23.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/71.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/71.self_attention.dense.weight/fp32.pt is not a valid file [default2]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/32.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:[2022-09-03 18:48:58,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/33.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]:[2022-09-03 18:48:58,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:48:58,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:48:59,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/33.self_attention.dense.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/21.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/57.self_attention.query_key_value.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/33.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/41.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/33.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/39.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:[2022-09-03 18:48:58,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:48:58,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:48:58,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/37.self_attention.dense.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/37.self_attention.query_key_value.weight/fp32.pt is not a valid file [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default6]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default6]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default6]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default6]: success = self._load_zero_checkpoint( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default6]: self.optimizer.load_state_dict( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default6]: self._load_universal_checkpoint(checkpoint_folder, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default6]: self._load_hp_checkpoint_state(checkpoint_folder) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default6]: lp.load_hp_checkpoint_state( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default6]: assert os.path.isfile(file), f'{file} is not a valid file' [default6]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/37.self_attention.query_key_value.weight/fp32.pt is not a valid file [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default7]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default7]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default7]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default7]: success = self._load_zero_checkpoint( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default7]: self.optimizer.load_state_dict( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default7]: self._load_universal_checkpoint(checkpoint_folder, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default7]: self._load_hp_checkpoint_state(checkpoint_folder) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]: lp.load_hp_checkpoint_state( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default7]: assert os.path.isfile(file), f'{file} is not a valid file' [default7]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/5.self_attention.dense.weight/fp32.pt is not a valid file [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default5]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default5]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default5]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default5]: success = self._load_zero_checkpoint( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default5]: self.optimizer.load_state_dict( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default5]: self._load_universal_checkpoint(checkpoint_folder, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default5]: self._load_hp_checkpoint_state(checkpoint_folder) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default5]: lp.load_hp_checkpoint_state( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default5]: assert os.path.isfile(file), f'{file} is not a valid file' [default5]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/43.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/64.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/64.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/64.self_attention.dense.weight/fp32.pt is not a valid file [default0]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/65.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/64.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:48:59,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: pretrain( [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default6]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default7]:[2022-09-03 18:48:59,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]: lp.load_hp_checkpoint_state( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/18.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: lp.load_hp_checkpoint_state( [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/18.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/18.self_attention.dense.weight/fp32.pt is not a valid file [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/18.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:48:59,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:48:59,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:48:59,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/19.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:48:59,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/16.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:[2022-09-03 18:48:59,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/mp_rank_00_model_states.pt. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/16.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/4.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/5.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/16.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: pretrain( [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/17.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/16.self_attention.dense.weight/fp32.pt is not a valid file [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: main() [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: pretrain( [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]: self.optimizer.load_state_dict( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: success = self._load_zero_checkpoint( [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/4.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: main() [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/4.self_attention.dense.weight/fp32.pt is not a valid file [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/4.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/10.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/10.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]: lp.load_hp_checkpoint_state( [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/10.self_attention.dense.weight/fp32.pt is not a valid file [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/10.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/11.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default0]: self.optimizer.load_state_dict( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: self._load_universal_checkpoint(checkpoint_folder, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default0]: lp.load_hp_checkpoint_state( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/70.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/70.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/70.self_attention.dense.weight/fp32.pt is not a valid file [default1]: lp.load_hp_checkpoint_state( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/14.self_attention.query_key_value.weight/fp32.pt is not a valid file [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default1]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default1]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default1]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default1]: success = self._load_zero_checkpoint( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default1]: self.optimizer.load_state_dict( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default1]: self._load_universal_checkpoint(checkpoint_folder, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default1]: self._load_hp_checkpoint_state(checkpoint_folder) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default1]: lp.load_hp_checkpoint_state( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default1]: assert os.path.isfile(file), f'{file} is not a valid file' [default1]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/42.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default2]: success = self._load_zero_checkpoint( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default2]: self._load_universal_checkpoint(checkpoint_folder, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/70.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/71.self_attention.query_key_value.weight/fp32.pt is not a valid file [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default0]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default0]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default2]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default2]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default4]: main() [default2]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default2]: success = self._load_zero_checkpoint( [default0]: success = self._load_zero_checkpoint( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default2]: self.optimizer.load_state_dict( [default0]: self.optimizer.load_state_dict( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default3]:Traceback (most recent call last): [default0]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default2]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default4]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default4]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default2]: self._load_hp_checkpoint_state(checkpoint_folder) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default2]: lp.load_hp_checkpoint_state( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default4]: success = self._load_zero_checkpoint( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default4]: self.optimizer.load_state_dict( [default0]: self._load_hp_checkpoint_state(checkpoint_folder) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain [default3]: model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer [default0]: lp.load_hp_checkpoint_state( [default2]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]: args.iteration = load_checkpoint(model, optimizer, lr_scheduler) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default0]: assert os.path.isfile(file), f'{file} is not a valid file' [default0]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/42.self_attention.query_key_value.weight/fp32.pt is not a valid file [default2]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/42.self_attention.query_key_value.weight/fp32.pt is not a valid file [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default4]: self._load_universal_checkpoint(checkpoint_folder, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default4]: self._load_hp_checkpoint_state(checkpoint_folder) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default4]: lp.load_hp_checkpoint_state( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default4]: assert os.path.isfile(file), f'{file} is not a valid file' [default4]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/43.self_attention.query_key_value.weight/fp32.pt is not a valid file [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint [default3]: loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint [default3]: success = self._load_zero_checkpoint( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint [default3]: self.optimizer.load_state_dict( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict [default3]: self._load_universal_checkpoint(checkpoint_folder, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint [default3]: self._load_hp_checkpoint_state(checkpoint_folder) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state [default3]: lp.load_hp_checkpoint_state( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state [default3]: assert os.path.isfile(file), f'{file} is not a valid file' [default3]:AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/42.self_attention.dense.weight/fp32.pt is not a valid file ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1970617) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1551169) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3591588) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1442002) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 369545) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1318553) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3607219) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1579369) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1372949) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1890007) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 419186) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3633541) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2669194) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3151960) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2930358) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1979622) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2636555) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 247679) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3039510) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2980542) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 407243) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3953429) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2227030) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1959808) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 511876) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 512760) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3019358) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1713687) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1777221) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3913404) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2133736) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2016061) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3630695) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 926765) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3783299) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1799482) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam47-ib0 rank : 273 (local_rank: 1) exitcode : 1 (pid: 926766) error_file: /tmp/torchelastic_11z4q9rj/none_fiyyucit/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/70.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam47-ib0 rank : 274 (local_rank: 2) exitcode : 1 (pid: 926767) error_file: /tmp/torchelastic_11z4q9rj/none_fiyyucit/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/70.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam47-ib0 rank : 275 (local_rank: 3) exitcode : 1 (pid: 926768) error_file: /tmp/torchelastic_11z4q9rj/none_fiyyucit/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/70.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam47-ib0 rank : 276 (local_rank: 4) exitcode : 1 (pid: 926769) exec(code, run_globals) error_file: /tmp/torchelastic_11z4q9rj/none_fiyyucit/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/71.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam47-ib0 rank : 277 (local_rank: 5) exitcode : 1 (pid: 926770) error_file: /tmp/torchelastic_11z4q9rj/none_fiyyucit/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/71.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam47-ib0 rank : 278 (local_rank: 6) exitcode : 1 (pid: 926771) error_file: /tmp/torchelastic_11z4q9rj/none_fiyyucit/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/71.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam47-ib0 rank : 279 (local_rank: 7) exitcode : 1 (pid: 926772) error_file: /tmp/torchelastic_11z4q9rj/none_fiyyucit/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/71.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam47-ib0 rank : 272 (local_rank: 0) exitcode : 1 (pid: 926765) error_file: /tmp/torchelastic_11z4q9rj/none_fiyyucit/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/70.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return launch_agent(self._config, self._entrypoint, list(args)) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ raise ChildFailedError( raise ChildFailedError( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam42-ib0 rank : 233 (local_rank: 1) exitcode : 1 (pid: 3039511) error_file: /tmp/torchelastic_5wkah3wt/none_a9rt74rx/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:58 host : jean-zay-iam26-ib0 rank : 113 (local_rank: 1) exitcode : 1 (pid: 419187) error_file: /tmp/torchelastic_0o9qfyu6/none_v2sdxv_2/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/30.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:58 host : jean-zay-iam26-ib0 rank : 114 (local_rank: 2) exitcode : 1 (pid: 419188) error_file: /tmp/torchelastic_0o9qfyu6/none_v2sdxv_2/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/60.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam42-ib0 rank : 234 (local_rank: 2) exitcode : 1 (pid: 3039512) error_file: /tmp/torchelastic_5wkah3wt/none_a9rt74rx/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/30.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:58 host : jean-zay-iam26-ib0 rank : 115 (local_rank: 3) exitcode : 1 (pid: 419189) error_file: /tmp/torchelastic_0o9qfyu6/none_v2sdxv_2/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/60.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam42-ib0 rank : 235 (local_rank: 3) exitcode : 1 (pid: 3039513) error_file: /tmp/torchelastic_5wkah3wt/none_a9rt74rx/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/30.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:58 host : jean-zay-iam26-ib0 rank : 116 (local_rank: 4) exitcode : 1 (pid: 419190) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/60.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam42-ib0 rank : 236 (local_rank: 4) exitcode : 1 (pid: 3039514) error_file: /tmp/torchelastic_0o9qfyu6/none_v2sdxv_2/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) error_file: /tmp/torchelastic_5wkah3wt/none_a9rt74rx/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/31.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:58 host : jean-zay-iam26-ib0 rank : 117 (local_rank: 5) exitcode : 1 (pid: 419191) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/61.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam42-ib0 rank : 237 (local_rank: 5) exitcode : 1 (pid: 3039515) error_file: /tmp/torchelastic_0o9qfyu6/none_v2sdxv_2/attempt_0/5/error.json traceback : Traceback (most recent call last): return _run_code(code, main_globals, None, error_file: /tmp/torchelastic_5wkah3wt/none_a9rt74rx/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/31.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:58 host : jean-zay-iam26-ib0 rank : 118 (local_rank: 6) exitcode : 1 (pid: 419192) error_file: /tmp/torchelastic_0o9qfyu6/none_v2sdxv_2/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/61.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam42-ib0 rank : 238 (local_rank: 6) exitcode : 1 (pid: 3039516) error_file: /tmp/torchelastic_5wkah3wt/none_a9rt74rx/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/31.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:58 host : jean-zay-iam26-ib0 rank : 119 (local_rank: 7) exitcode : 1 (pid: 419193) error_file: /tmp/torchelastic_0o9qfyu6/none_v2sdxv_2/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( run(args) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state return _run_code(code, main_globals, None, assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/31.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:58 host : jean-zay-iam26-ib0 rank : 112 (local_rank: 0) exitcode : 1 (pid: 419186) error_file: /tmp/torchelastic_0o9qfyu6/none_v2sdxv_2/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/30.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main exec(code, run_globals) main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper raise ChildFailedError( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ exec(code, run_globals) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam39-ib0 rank : 209 (local_rank: 1) exitcode : 1 (pid: 1372950) error_file: /tmp/torchelastic_ob8i5j2l/none_bzoobber/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return _run_code(code, main_globals, None, success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state main() Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper raise ChildFailedError( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/54.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam39-ib0 rank : 210 (local_rank: 2) exitcode : 1 (pid: 1372951) error_file: /tmp/torchelastic_ob8i5j2l/none_bzoobber/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) exec(code, run_globals) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/54.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam39-ib0 rank : 211 (local_rank: 3) exitcode : 1 (pid: 1372952) error_file: /tmp/torchelastic_ob8i5j2l/none_bzoobber/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain return _run_code(code, main_globals, None, torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam07-ib0 rank : 41 (local_rank: 1) exitcode : 1 (pid: 3953430) error_file: /tmp/torchelastic_vnep2ski/none_nz_plgbm/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/54.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam39-ib0 rank : 212 (local_rank: 4) exitcode : 1 (pid: 1372953) error_file: /tmp/torchelastic_ob8i5j2l/none_bzoobber/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, return f(*args, **kwargs) return _run_code(code, main_globals, None, return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/55.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam39-ib0 rank : 213 (local_rank: 5) exitcode : 1 (pid: 1372954) main() run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() error_file: /tmp/torchelastic_ob8i5j2l/none_bzoobber/attempt_0/5/error.json traceback : Traceback (most recent call last): Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/12.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam07-ib0 rank : 42 (local_rank: 2) exitcode : 1 (pid: 3953431) error_file: /tmp/torchelastic_vnep2ski/none_nz_plgbm/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/55.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam39-ib0 rank : 214 (local_rank: 6) exitcode : 1 (pid: 1372955) error_file: /tmp/torchelastic_ob8i5j2l/none_bzoobber/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/55.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam39-ib0 rank : 215 (local_rank: 7) exitcode : 1 (pid: 1372956) error_file: /tmp/torchelastic_ob8i5j2l/none_bzoobber/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/12.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam07-ib0 rank : 43 (local_rank: 3) exitcode : 1 (pid: 3953432) error_file: /tmp/torchelastic_vnep2ski/none_nz_plgbm/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/55.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam39-ib0 rank : 208 (local_rank: 0) exitcode : 1 (pid: 1372949) error_file: /tmp/torchelastic_ob8i5j2l/none_bzoobber/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/12.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam07-ib0 rank : 44 (local_rank: 4) exitcode : 1 (pid: 3953433) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint main() error_file: /tmp/torchelastic_vnep2ski/none_nz_plgbm/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, elastic_launch( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/54.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/13.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam07-ib0 rank : 45 (local_rank: 5) exitcode : 1 (pid: 3953434) raise ChildFailedError( error_file: /tmp/torchelastic_vnep2ski/none_nz_plgbm/attempt_0/5/error.json traceback : Traceback (most recent call last): Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/13.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam07-ib0 rank : 46 (local_rank: 6) exitcode : 1 (pid: 3953435) error_file: /tmp/torchelastic_vnep2ski/none_nz_plgbm/attempt_0/6/error.json traceback : Traceback (most recent call last): return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:58 host : jean-zay-iam37-ib0 rank : 193 (local_rank: 1) exitcode : 1 (pid: 3151961) error_file: /tmp/torchelastic_gf6ie1a4/none_m9y1ttrl/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( return f(*args, **kwargs) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/13.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam07-ib0 rank : 47 (local_rank: 7) exitcode : 1 (pid: 3953436) error_file: /tmp/torchelastic_vnep2ski/none_nz_plgbm/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in raise ChildFailedError( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/13.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam07-ib0 rank : 40 (local_rank: 0) exitcode : 1 (pid: 3953429) error_file: /tmp/torchelastic_vnep2ski/none_nz_plgbm/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint return _run_code(code, main_globals, None, elastic_launch( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state return _run_code(code, main_globals, None, assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/12.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/50.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:58 host : jean-zay-iam37-ib0 rank : 194 (local_rank: 2) exitcode : 1 (pid: 3151962) error_file: /tmp/torchelastic_gf6ie1a4/none_m9y1ttrl/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam27-ib0 rank : 121 (local_rank: 1) exitcode : 1 (pid: 247680) error_file: /tmp/torchelastic_zs96aqgd/none_qmkeifmx/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' return _run_code(code, main_globals, None, AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/50.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:58 host : jean-zay-iam37-ib0 rank : 195 (local_rank: 3) exitcode : 1 (pid: 3151963) error_file: /tmp/torchelastic_gf6ie1a4/none_m9y1ttrl/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/50.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:58 host : jean-zay-iam37-ib0 rank : 196 (local_rank: 4) exitcode : 1 (pid: 3151964) exec(code, run_globals) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent error_file: /tmp/torchelastic_gf6ie1a4/none_m9y1ttrl/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, return _run_code(code, main_globals, None, elastic_launch( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/51.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:58 host : jean-zay-iam37-ib0 rank : 197 (local_rank: 5) exitcode : 1 (pid: 3151965) elastic_launch( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/32.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam27-ib0 rank : 122 (local_rank: 2) exitcode : 1 (pid: 247681) error_file: /tmp/torchelastic_zs96aqgd/none_qmkeifmx/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ error_file: /tmp/torchelastic_gf6ie1a4/none_m9y1ttrl/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( raise ChildFailedError( loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/51.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:58 host : jean-zay-iam37-ib0 rank : 198 (local_rank: 6) exitcode : 1 (pid: 3151966) error_file: /tmp/torchelastic_gf6ie1a4/none_m9y1ttrl/attempt_0/6/error.json traceback : Traceback (most recent call last): exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/32.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam27-ib0 rank : 123 (local_rank: 3) exitcode : 1 (pid: 247682) error_file: /tmp/torchelastic_zs96aqgd/none_qmkeifmx/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( elastic_launch( loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/32.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam27-ib0 rank : 124 (local_rank: 4) exitcode : 1 (pid: 247683) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/51.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:58 host : jean-zay-iam37-ib0 rank : 199 (local_rank: 7) exitcode : 1 (pid: 3151967) error_file: /tmp/torchelastic_gf6ie1a4/none_m9y1ttrl/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, error_file: /tmp/torchelastic_zs96aqgd/none_qmkeifmx/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam11-ib0 rank : 65 (local_rank: 1) exitcode : 1 (pid: 1970618) error_file: /tmp/torchelastic_u35whuef/none_u7fymesa/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/33.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam27-ib0 rank : 125 (local_rank: 5) exitcode : 1 (pid: 247684) raise ChildFailedError( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/51.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:58 host : jean-zay-iam37-ib0 rank : 192 (local_rank: 0) exitcode : 1 (pid: 3151960) error_file: /tmp/torchelastic_gf6ie1a4/none_m9y1ttrl/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic_zs96aqgd/none_qmkeifmx/attempt_0/5/error.json traceback : Traceback (most recent call last): return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/50.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/33.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam27-ib0 rank : 126 (local_rank: 6) exitcode : 1 (pid: 247685) error_file: /tmp/torchelastic_zs96aqgd/none_qmkeifmx/attempt_0/6/error.json traceback : Traceback (most recent call last): exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam40-ib0 rank : 217 (local_rank: 1) exitcode : 1 (pid: 1318554) error_file: /tmp/torchelastic_h2mnc2q1/none_hq2v8f17/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/33.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam27-ib0 rank : 127 (local_rank: 7) exitcode : 1 (pid: 247686) error_file: /tmp/torchelastic_zs96aqgd/none_qmkeifmx/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( return _run_code(code, main_globals, None, main() main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/18.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam11-ib0 rank : 66 (local_rank: 2) exitcode : 1 (pid: 1970619) error_file: /tmp/torchelastic_u35whuef/none_u7fymesa/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:58 host : jean-zay-iam52-ib0 rank : 281 (local_rank: 1) exitcode : 1 (pid: 1777222) error_file: /tmp/torchelastic_corun968/none_uada1h9s/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/33.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam27-ib0 rank : 120 (local_rank: 0) exitcode : 1 (pid: 247679) error_file: /tmp/torchelastic_zs96aqgd/none_qmkeifmx/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) return launch_agent(self._config, self._entrypoint, list(args)) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam03-ib0 rank : 9 (local_rank: 1) exitcode : 1 (pid: 1890008) error_file: /tmp/torchelastic__hzjwr0w/none_rh4rtacd/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/32.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/18.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam11-ib0 rank : 67 (local_rank: 3) exitcode : 1 (pid: 1970620) error_file: /tmp/torchelastic_u35whuef/none_u7fymesa/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/56.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam40-ib0 rank : 218 (local_rank: 2) exitcode : 1 (pid: 1318555) error_file: /tmp/torchelastic_h2mnc2q1/none_hq2v8f17/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain exec(code, run_globals) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, raise ChildFailedError( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/18.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam11-ib0 rank : 68 (local_rank: 4) exitcode : 1 (pid: 1970621) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main error_file: /tmp/torchelastic_u35whuef/none_u7fymesa/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/72.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:58 host : jean-zay-iam52-ib0 rank : 282 (local_rank: 2) exitcode : 1 (pid: 1777223) error_file: /tmp/torchelastic_corun968/none_uada1h9s/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/4.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam03-ib0 rank : 10 (local_rank: 2) exitcode : 1 (pid: 1890009) error_file: /tmp/torchelastic__hzjwr0w/none_rh4rtacd/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/56.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam40-ib0 rank : 219 (local_rank: 3) exitcode : 1 (pid: 1318556) error_file: /tmp/torchelastic_h2mnc2q1/none_hq2v8f17/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/19.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam11-ib0 rank : 69 (local_rank: 5) exitcode : 1 (pid: 1970622) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) error_file: /tmp/torchelastic_u35whuef/none_u7fymesa/attempt_0/5/error.json traceback : Traceback (most recent call last): torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam31-ib0 rank : 145 (local_rank: 1) exitcode : 1 (pid: 512761) error_file: /tmp/torchelastic_5k5_yzs8/none_4vitg8_p/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/56.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam40-ib0 rank : 220 (local_rank: 4) exitcode : 1 (pid: 1318557) elastic_launch( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint error_file: /tmp/torchelastic_h2mnc2q1/none_hq2v8f17/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/19.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam11-ib0 rank : 70 (local_rank: 6) exitcode : 1 (pid: 1970623) error_file: /tmp/torchelastic_u35whuef/none_u7fymesa/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/72.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:58 host : jean-zay-iam52-ib0 rank : 283 (local_rank: 3) exitcode : 1 (pid: 1777224) error_file: /tmp/torchelastic_corun968/none_uada1h9s/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/4.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam03-ib0 rank : 11 (local_rank: 3) exitcode : 1 (pid: 1890010) error_file: /tmp/torchelastic__hzjwr0w/none_rh4rtacd/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/57.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam40-ib0 rank : 221 (local_rank: 5) exitcode : 1 (pid: 1318558) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) error_file: /tmp/torchelastic_h2mnc2q1/none_hq2v8f17/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/38.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam31-ib0 rank : 146 (local_rank: 2) exitcode : 1 (pid: 512762) error_file: /tmp/torchelastic_5k5_yzs8/none_4vitg8_p/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/72.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:58 host : jean-zay-iam52-ib0 rank : 284 (local_rank: 4) exitcode : 1 (pid: 1777225) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/4.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam03-ib0 rank : 12 (local_rank: 4) exitcode : 1 (pid: 1890011) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/19.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam11-ib0 rank : 71 (local_rank: 7) exitcode : 1 (pid: 1970624) error_file: /tmp/torchelastic_u35whuef/none_u7fymesa/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) error_file: /tmp/torchelastic_corun968/none_uada1h9s/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) error_file: /tmp/torchelastic__hzjwr0w/none_rh4rtacd/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/57.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam40-ib0 rank : 222 (local_rank: 6) exitcode : 1 (pid: 1318559) error_file: /tmp/torchelastic_h2mnc2q1/none_hq2v8f17/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:58 host : jean-zay-iam52-ib0 rank : 285 (local_rank: 5) exitcode : 1 (pid: 1777226) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/5.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam03-ib0 rank : 13 (local_rank: 5) exitcode : 1 (pid: 1890012) assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/19.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam11-ib0 rank : 64 (local_rank: 0) exitcode : 1 (pid: 1970617) error_file: /tmp/torchelastic_u35whuef/none_u7fymesa/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper error_file: /tmp/torchelastic_corun968/none_uada1h9s/attempt_0/5/error.json traceback : Traceback (most recent call last): run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run error_file: /tmp/torchelastic__hzjwr0w/none_rh4rtacd/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/38.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam31-ib0 rank : 147 (local_rank: 3) exitcode : 1 (pid: 512763) error_file: /tmp/torchelastic_5k5_yzs8/none_4vitg8_p/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/57.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam40-ib0 rank : 223 (local_rank: 7) exitcode : 1 (pid: 1318560) error_file: /tmp/torchelastic_h2mnc2q1/none_hq2v8f17/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) run(args) return launch_agent(self._config, self._entrypoint, list(args)) main() loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/18.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:58 host : jean-zay-iam52-ib0 rank : 286 (local_rank: 6) exitcode : 1 (pid: 1777227) error_file: /tmp/torchelastic_corun968/none_uada1h9s/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/5.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam03-ib0 rank : 14 (local_rank: 6) exitcode : 1 (pid: 1890013) error_file: /tmp/torchelastic__hzjwr0w/none_rh4rtacd/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state main() main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/38.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam31-ib0 rank : 148 (local_rank: 4) exitcode : 1 (pid: 512764) assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/57.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam40-ib0 rank : 216 (local_rank: 0) exitcode : 1 (pid: 1318553) error_file: /tmp/torchelastic_h2mnc2q1/none_hq2v8f17/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) error_file: /tmp/torchelastic_5k5_yzs8/none_4vitg8_p/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:58 host : jean-zay-iam52-ib0 rank : 287 (local_rank: 7) exitcode : 1 (pid: 1777228) error_file: /tmp/torchelastic_corun968/none_uada1h9s/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/5.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam03-ib0 rank : 15 (local_rank: 7) exitcode : 1 (pid: 1890014) error_file: /tmp/torchelastic__hzjwr0w/none_rh4rtacd/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/39.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam31-ib0 rank : 149 (local_rank: 5) exitcode : 1 (pid: 512765) assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/56.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:58 host : jean-zay-iam43-ib0 rank : 241 (local_rank: 1) exitcode : 1 (pid: 2980543) error_file: /tmp/torchelastic_l4cexzit/none_aav87ho0/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint error_file: /tmp/torchelastic_5k5_yzs8/none_4vitg8_p/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run raise ChildFailedError( return f(*args, **kwargs) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:58 host : jean-zay-iam52-ib0 rank : 280 (local_rank: 0) exitcode : 1 (pid: 1777221) error_file: /tmp/torchelastic_corun968/none_uada1h9s/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/5.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam03-ib0 rank : 8 (local_rank: 0) exitcode : 1 (pid: 1890007) error_file: /tmp/torchelastic__hzjwr0w/none_rh4rtacd/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint raise ChildFailedError( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/39.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam31-ib0 rank : 150 (local_rank: 6) exitcode : 1 (pid: 512766) error_file: /tmp/torchelastic_5k5_yzs8/none_4vitg8_p/attempt_0/6/error.json traceback : Traceback (most recent call last): return f(*args, **kwargs) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/72.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam02-ib0 rank : 1 (local_rank: 1) exitcode : 1 (pid: 3633542) error_file: /tmp/torchelastic_geyvv46l/none__bh30osj/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/4.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/62.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:58 host : jean-zay-iam43-ib0 rank : 242 (local_rank: 2) exitcode : 1 (pid: 2980544) error_file: /tmp/torchelastic_l4cexzit/none_aav87ho0/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam13-ib0 rank : 73 (local_rank: 1) exitcode : 1 (pid: 1959809) error_file: /tmp/torchelastic_9lazsfau/none__0v9c4y8/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/39.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam31-ib0 rank : 151 (local_rank: 7) exitcode : 1 (pid: 512767) error_file: /tmp/torchelastic_5k5_yzs8/none_4vitg8_p/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint elastic_launch( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/39.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam31-ib0 rank : 144 (local_rank: 0) exitcode : 1 (pid: 512760) error_file: /tmp/torchelastic_5k5_yzs8/none_4vitg8_p/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/38.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/62.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:58 host : jean-zay-iam43-ib0 rank : 243 (local_rank: 3) exitcode : 1 (pid: 2980545) error_file: /tmp/torchelastic_l4cexzit/none_aav87ho0/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam02-ib0 rank : 2 (local_rank: 2) exitcode : 1 (pid: 3633543) error_file: /tmp/torchelastic_geyvv46l/none__bh30osj/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/20.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam13-ib0 rank : 74 (local_rank: 2) exitcode : 1 (pid: 1959810) error_file: /tmp/torchelastic_9lazsfau/none__0v9c4y8/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, elastic_launch( raise ChildFailedError( return launch_agent(self._config, self._entrypoint, list(args)) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) run(args) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/62.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:58 host : jean-zay-iam43-ib0 rank : 244 (local_rank: 4) exitcode : 1 (pid: 2980546) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( error_file: /tmp/torchelastic_l4cexzit/none_aav87ho0/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:58 host : jean-zay-iam36-ib0 rank : 185 (local_rank: 1) exitcode : 1 (pid: 1799483) error_file: /tmp/torchelastic_ltm57uni/none_u1zfalgu/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' main() torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:58 host : jean-zay-iam04-ib0 rank : 17 (local_rank: 1) exitcode : 1 (pid: 1979623) error_file: /tmp/torchelastic_125p5e71/none_3itys_o5/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run main() torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam08-ib0 rank : 49 (local_rank: 1) exitcode : 1 (pid: 2930359) error_file: /tmp/torchelastic_bgkkpqcp/none_fq69h1cd/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam02-ib0 rank : 3 (local_rank: 3) exitcode : 1 (pid: 3633544) error_file: /tmp/torchelastic_geyvv46l/none__bh30osj/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/20.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam13-ib0 rank : 75 (local_rank: 3) exitcode : 1 (pid: 1959811) error_file: /tmp/torchelastic_9lazsfau/none__0v9c4y8/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/63.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:58 host : jean-zay-iam43-ib0 rank : 245 (local_rank: 5) exitcode : 1 (pid: 2980547) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam28-ib0 rank : 129 (local_rank: 1) exitcode : 1 (pid: 3607220) error_file: /tmp/torchelastic_kvrkkbj8/none_yu2aod8j/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint error_file: /tmp/torchelastic_l4cexzit/none_aav87ho0/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam02-ib0 rank : 4 (local_rank: 4) exitcode : 1 (pid: 3633545) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/20.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam13-ib0 rank : 76 (local_rank: 4) exitcode : 1 (pid: 1959812) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) raise ChildFailedError( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state error_file: /tmp/torchelastic_geyvv46l/none__bh30osj/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) error_file: /tmp/torchelastic_9lazsfau/none__0v9c4y8/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/63.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:58 host : jean-zay-iam43-ib0 rank : 246 (local_rank: 6) exitcode : 1 (pid: 2980548) error_file: /tmp/torchelastic_l4cexzit/none_aav87ho0/attempt_0/6/error.json traceback : Traceback (most recent call last): success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam35-ib0 rank : 177 (local_rank: 1) exitcode : 1 (pid: 1551170) error_file: /tmp/torchelastic_xkf7rco8/none_4n03omvw/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return f(*args, **kwargs) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/6.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:58 host : jean-zay-iam04-ib0 rank : 18 (local_rank: 2) exitcode : 1 (pid: 1979624) error_file: /tmp/torchelastic_125p5e71/none_3itys_o5/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/14.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam08-ib0 rank : 50 (local_rank: 2) exitcode : 1 (pid: 2930360) error_file: /tmp/torchelastic_bgkkpqcp/none_fq69h1cd/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/34.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam28-ib0 rank : 130 (local_rank: 2) exitcode : 1 (pid: 3607221) error_file: /tmp/torchelastic_kvrkkbj8/none_yu2aod8j/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/3.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam02-ib0 rank : 5 (local_rank: 5) exitcode : 1 (pid: 3633546) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/21.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam13-ib0 rank : 77 (local_rank: 5) exitcode : 1 (pid: 1959813) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/48.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:58 host : jean-zay-iam36-ib0 rank : 186 (local_rank: 2) exitcode : 1 (pid: 1799484) error_file: /tmp/torchelastic_ltm57uni/none_u1zfalgu/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam45-ib0 rank : 257 (local_rank: 1) exitcode : 1 (pid: 407244) error_file: /tmp/torchelastic_op9mflre/none_rxega3j6/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) error_file: /tmp/torchelastic_geyvv46l/none__bh30osj/attempt_0/5/error.json traceback : Traceback (most recent call last): elastic_launch( error_file: /tmp/torchelastic_9lazsfau/none__0v9c4y8/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/63.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:58 host : jean-zay-iam43-ib0 rank : 247 (local_rank: 7) exitcode : 1 (pid: 2980549) error_file: /tmp/torchelastic_l4cexzit/none_aav87ho0/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/6.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:58 host : jean-zay-iam04-ib0 rank : 19 (local_rank: 3) exitcode : 1 (pid: 1979625) error_file: /tmp/torchelastic_125p5e71/none_3itys_o5/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/14.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam08-ib0 rank : 51 (local_rank: 3) exitcode : 1 (pid: 2930361) error_file: /tmp/torchelastic_bgkkpqcp/none_fq69h1cd/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/46.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam35-ib0 rank : 178 (local_rank: 2) exitcode : 1 (pid: 1551171) error_file: /tmp/torchelastic_xkf7rco8/none_4n03omvw/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/34.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam28-ib0 rank : 131 (local_rank: 3) exitcode : 1 (pid: 3607222) error_file: /tmp/torchelastic_kvrkkbj8/none_yu2aod8j/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/3.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam02-ib0 rank : 6 (local_rank: 6) exitcode : 1 (pid: 3633547) error_file: /tmp/torchelastic_geyvv46l/none__bh30osj/attempt_0/6/error.json traceback : Traceback (most recent call last): return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/21.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam13-ib0 rank : 78 (local_rank: 6) exitcode : 1 (pid: 1959814) error_file: /tmp/torchelastic_9lazsfau/none__0v9c4y8/attempt_0/6/error.json traceback : Traceback (most recent call last): model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/63.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:58 host : jean-zay-iam43-ib0 rank : 240 (local_rank: 0) exitcode : 1 (pid: 2980542) error_file: /tmp/torchelastic_l4cexzit/none_aav87ho0/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/66.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam45-ib0 rank : 258 (local_rank: 2) exitcode : 1 (pid: 407245) error_file: /tmp/torchelastic_op9mflre/none_rxega3j6/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/6.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:58 host : jean-zay-iam04-ib0 rank : 20 (local_rank: 4) exitcode : 1 (pid: 1979626) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint return launch_agent(self._config, self._entrypoint, list(args)) AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/48.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:58 host : jean-zay-iam36-ib0 rank : 187 (local_rank: 3) exitcode : 1 (pid: 1799485) error_file: /tmp/torchelastic_ltm57uni/none_u1zfalgu/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/14.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam08-ib0 rank : 52 (local_rank: 4) exitcode : 1 (pid: 2930362) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/34.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam28-ib0 rank : 132 (local_rank: 4) exitcode : 1 (pid: 3607223) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run error_file: /tmp/torchelastic_125p5e71/none_3itys_o5/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( error_file: /tmp/torchelastic_bgkkpqcp/none_fq69h1cd/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' error_file: /tmp/torchelastic_kvrkkbj8/none_yu2aod8j/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/3.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam02-ib0 rank : 7 (local_rank: 7) exitcode : 1 (pid: 3633548) error_file: /tmp/torchelastic_geyvv46l/none__bh30osj/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/21.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam13-ib0 rank : 79 (local_rank: 7) exitcode : 1 (pid: 1959815) error_file: /tmp/torchelastic_9lazsfau/none__0v9c4y8/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( run(args) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/62.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/46.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam35-ib0 rank : 179 (local_rank: 3) exitcode : 1 (pid: 1551172) error_file: /tmp/torchelastic_xkf7rco8/none_4n03omvw/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/7.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:58 host : jean-zay-iam04-ib0 rank : 21 (local_rank: 5) exitcode : 1 (pid: 1979627) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run raise ChildFailedError( raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/48.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:58 host : jean-zay-iam36-ib0 rank : 188 (local_rank: 4) exitcode : 1 (pid: 1799486) AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/66.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam45-ib0 rank : 259 (local_rank: 3) exitcode : 1 (pid: 407246) error_file: /tmp/torchelastic_op9mflre/none_rxega3j6/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/15.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam08-ib0 rank : 53 (local_rank: 5) exitcode : 1 (pid: 2930363) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/35.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam28-ib0 rank : 133 (local_rank: 5) exitcode : 1 (pid: 3607224) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam06-ib0 rank : 33 (local_rank: 1) exitcode : 1 (pid: 3630696) error_file: /tmp/torchelastic_8e_1zy13/none_djsx1hvi/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state error_file: /tmp/torchelastic_125p5e71/none_3itys_o5/attempt_0/5/error.json traceback : Traceback (most recent call last): raise ChildFailedError( error_file: /tmp/torchelastic_ltm57uni/none_u1zfalgu/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( error_file: /tmp/torchelastic_bgkkpqcp/none_fq69h1cd/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, error_file: /tmp/torchelastic_kvrkkbj8/none_yu2aod8j/attempt_0/5/error.json traceback : Traceback (most recent call last): assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/3.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam02-ib0 rank : 0 (local_rank: 0) exitcode : 1 (pid: 3633541) error_file: /tmp/torchelastic_geyvv46l/none__bh30osj/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/21.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam13-ib0 rank : 72 (local_rank: 0) exitcode : 1 (pid: 1959808) error_file: /tmp/torchelastic_9lazsfau/none__0v9c4y8/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/46.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam35-ib0 rank : 180 (local_rank: 4) exitcode : 1 (pid: 1551173) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/49.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:58 host : jean-zay-iam36-ib0 rank : 189 (local_rank: 5) exitcode : 1 (pid: 1799487) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/66.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam45-ib0 rank : 260 (local_rank: 4) exitcode : 1 (pid: 407247) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) error_file: /tmp/torchelastic_xkf7rco8/none_4n03omvw/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/7.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:58 host : jean-zay-iam04-ib0 rank : 22 (local_rank: 6) exitcode : 1 (pid: 1979628) error_file: /tmp/torchelastic_125p5e71/none_3itys_o5/attempt_0/6/error.json traceback : Traceback (most recent call last): torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:58 host : jean-zay-iam18-ib0 rank : 97 (local_rank: 1) exitcode : 1 (pid: 2636556) error_file: /tmp/torchelastic_2_o51g2h/none_yf3ylyhy/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam41-ib0 rank : 225 (local_rank: 1) exitcode : 1 (pid: 2669195) error_file: /tmp/torchelastic_ogledwd8/none_jhr34_yn/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic_ltm57uni/none_u1zfalgu/attempt_0/5/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_op9mflre/none_rxega3j6/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/15.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam08-ib0 rank : 54 (local_rank: 6) exitcode : 1 (pid: 2930364) error_file: /tmp/torchelastic_bgkkpqcp/none_fq69h1cd/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/35.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam28-ib0 rank : 134 (local_rank: 6) exitcode : 1 (pid: 3607225) error_file: /tmp/torchelastic_kvrkkbj8/none_yu2aod8j/attempt_0/6/error.json traceback : Traceback (most recent call last): assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/tied_modules.embed.word_embeddings.weight/fp32.pt is not a valid file ============================================================ success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/20.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam32-ib0 rank : 153 (local_rank: 1) exitcode : 1 (pid: 511877) error_file: /tmp/torchelastic_blk3xnes/none_vbtq9eq2/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/47.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam35-ib0 rank : 181 (local_rank: 5) exitcode : 1 (pid: 1551174) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/10.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam06-ib0 rank : 34 (local_rank: 2) exitcode : 1 (pid: 3630697) error_file: /tmp/torchelastic_8e_1zy13/none_djsx1hvi/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/67.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam45-ib0 rank : 261 (local_rank: 5) exitcode : 1 (pid: 407248) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) error_file: /tmp/torchelastic_xkf7rco8/none_4n03omvw/attempt_0/5/error.json traceback : Traceback (most recent call last): loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/49.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:58 host : jean-zay-iam36-ib0 rank : 190 (local_rank: 6) exitcode : 1 (pid: 1799488) error_file: /tmp/torchelastic_ltm57uni/none_u1zfalgu/attempt_0/6/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_op9mflre/none_rxega3j6/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/7.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:58 host : jean-zay-iam04-ib0 rank : 23 (local_rank: 7) exitcode : 1 (pid: 1979629) error_file: /tmp/torchelastic_125p5e71/none_3itys_o5/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/15.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam08-ib0 rank : 55 (local_rank: 7) exitcode : 1 (pid: 2930365) error_file: /tmp/torchelastic_bgkkpqcp/none_fq69h1cd/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/35.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam28-ib0 rank : 135 (local_rank: 7) exitcode : 1 (pid: 3607226) error_file: /tmp/torchelastic_kvrkkbj8/none_yu2aod8j/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/26.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:58 host : jean-zay-iam18-ib0 rank : 98 (local_rank: 2) exitcode : 1 (pid: 2636557) error_file: /tmp/torchelastic_2_o51g2h/none_yf3ylyhy/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/40.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam32-ib0 rank : 154 (local_rank: 2) exitcode : 1 (pid: 511878) error_file: /tmp/torchelastic_blk3xnes/none_vbtq9eq2/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/58.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam41-ib0 rank : 226 (local_rank: 2) exitcode : 1 (pid: 2669196) error_file: /tmp/torchelastic_ogledwd8/none_jhr34_yn/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/47.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam35-ib0 rank : 182 (local_rank: 6) exitcode : 1 (pid: 1551175) error_file: /tmp/torchelastic_xkf7rco8/none_4n03omvw/attempt_0/6/error.json traceback : Traceback (most recent call last): elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/10.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam06-ib0 rank : 35 (local_rank: 3) exitcode : 1 (pid: 3630698) error_file: /tmp/torchelastic_8e_1zy13/none_djsx1hvi/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/67.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam45-ib0 rank : 262 (local_rank: 6) exitcode : 1 (pid: 407249) error_file: /tmp/torchelastic_op9mflre/none_rxega3j6/attempt_0/6/error.json traceback : Traceback (most recent call last): success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/7.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:58 host : jean-zay-iam04-ib0 rank : 16 (local_rank: 0) exitcode : 1 (pid: 1979622) error_file: /tmp/torchelastic_125p5e71/none_3itys_o5/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/49.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:58 host : jean-zay-iam36-ib0 rank : 191 (local_rank: 7) exitcode : 1 (pid: 1799489) error_file: /tmp/torchelastic_ltm57uni/none_u1zfalgu/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/15.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam08-ib0 rank : 48 (local_rank: 0) exitcode : 1 (pid: 2930358) error_file: /tmp/torchelastic_bgkkpqcp/none_fq69h1cd/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/35.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam28-ib0 rank : 128 (local_rank: 0) exitcode : 1 (pid: 3607219) error_file: /tmp/torchelastic_kvrkkbj8/none_yu2aod8j/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/10.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam06-ib0 rank : 36 (local_rank: 4) exitcode : 1 (pid: 3630699) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/26.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:58 host : jean-zay-iam18-ib0 rank : 99 (local_rank: 3) exitcode : 1 (pid: 2636558) error_file: /tmp/torchelastic_2_o51g2h/none_yf3ylyhy/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/40.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam32-ib0 rank : 155 (local_rank: 3) exitcode : 1 (pid: 511879) error_file: /tmp/torchelastic_blk3xnes/none_vbtq9eq2/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/58.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam41-ib0 rank : 227 (local_rank: 3) exitcode : 1 (pid: 2669197) error_file: /tmp/torchelastic_ogledwd8/none_jhr34_yn/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/47.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam35-ib0 rank : 183 (local_rank: 7) exitcode : 1 (pid: 1551176) error_file: /tmp/torchelastic_xkf7rco8/none_4n03omvw/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state error_file: /tmp/torchelastic_8e_1zy13/none_djsx1hvi/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/6.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/49.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:58 host : jean-zay-iam36-ib0 rank : 184 (local_rank: 0) exitcode : 1 (pid: 1799482) error_file: /tmp/torchelastic_ltm57uni/none_u1zfalgu/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/67.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam45-ib0 rank : 263 (local_rank: 7) exitcode : 1 (pid: 407250) error_file: /tmp/torchelastic_op9mflre/none_rxega3j6/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/14.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/34.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/11.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam06-ib0 rank : 37 (local_rank: 5) exitcode : 1 (pid: 3630700) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/26.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:58 host : jean-zay-iam18-ib0 rank : 100 (local_rank: 4) exitcode : 1 (pid: 2636559) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/40.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam32-ib0 rank : 156 (local_rank: 4) exitcode : 1 (pid: 511880) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/58.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam41-ib0 rank : 228 (local_rank: 4) exitcode : 1 (pid: 2669198) raise ChildFailedError( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/47.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam35-ib0 rank : 176 (local_rank: 0) exitcode : 1 (pid: 1551169) error_file: /tmp/torchelastic_xkf7rco8/none_4n03omvw/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic_8e_1zy13/none_djsx1hvi/attempt_0/5/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_2_o51g2h/none_yf3ylyhy/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) error_file: /tmp/torchelastic_blk3xnes/none_vbtq9eq2/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) error_file: /tmp/torchelastic_ogledwd8/none_jhr34_yn/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/48.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/67.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam45-ib0 rank : 256 (local_rank: 0) exitcode : 1 (pid: 407243) error_file: /tmp/torchelastic_op9mflre/none_rxega3j6/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/27.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:58 host : jean-zay-iam18-ib0 rank : 101 (local_rank: 5) exitcode : 1 (pid: 2636560) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/41.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam32-ib0 rank : 157 (local_rank: 5) exitcode : 1 (pid: 511881) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/59.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam41-ib0 rank : 229 (local_rank: 5) exitcode : 1 (pid: 2669199) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/46.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/11.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam06-ib0 rank : 38 (local_rank: 6) exitcode : 1 (pid: 3630701) error_file: /tmp/torchelastic_8e_1zy13/none_djsx1hvi/attempt_0/6/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_2_o51g2h/none_yf3ylyhy/attempt_0/5/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_blk3xnes/none_vbtq9eq2/attempt_0/5/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_ogledwd8/none_jhr34_yn/attempt_0/5/error.json traceback : Traceback (most recent call last): assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/66.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:58 host : jean-zay-iam46-ib0 rank : 265 (local_rank: 1) exitcode : 1 (pid: 3913405) error_file: /tmp/torchelastic_s350fq75/none_2nvwyvsf/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/27.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:58 host : jean-zay-iam18-ib0 rank : 102 (local_rank: 6) exitcode : 1 (pid: 2636561) error_file: /tmp/torchelastic_2_o51g2h/none_yf3ylyhy/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/41.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam32-ib0 rank : 158 (local_rank: 6) exitcode : 1 (pid: 511882) error_file: /tmp/torchelastic_blk3xnes/none_vbtq9eq2/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/59.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam41-ib0 rank : 230 (local_rank: 6) exitcode : 1 (pid: 2669200) error_file: /tmp/torchelastic_ogledwd8/none_jhr34_yn/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/11.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam06-ib0 rank : 39 (local_rank: 7) exitcode : 1 (pid: 3630702) error_file: /tmp/torchelastic_8e_1zy13/none_djsx1hvi/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/11.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam06-ib0 rank : 32 (local_rank: 0) exitcode : 1 (pid: 3630695) error_file: /tmp/torchelastic_8e_1zy13/none_djsx1hvi/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/27.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:58 host : jean-zay-iam18-ib0 rank : 103 (local_rank: 7) exitcode : 1 (pid: 2636562) error_file: /tmp/torchelastic_2_o51g2h/none_yf3ylyhy/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/41.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam32-ib0 rank : 159 (local_rank: 7) exitcode : 1 (pid: 511883) error_file: /tmp/torchelastic_blk3xnes/none_vbtq9eq2/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/59.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam41-ib0 rank : 231 (local_rank: 7) exitcode : 1 (pid: 2669201) error_file: /tmp/torchelastic_ogledwd8/none_jhr34_yn/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/68.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:58 host : jean-zay-iam46-ib0 rank : 266 (local_rank: 2) exitcode : 1 (pid: 3913406) error_file: /tmp/torchelastic_s350fq75/none_2nvwyvsf/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint raise ChildFailedError( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) raise ChildFailedError( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/10.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/27.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:58 host : jean-zay-iam18-ib0 rank : 96 (local_rank: 0) exitcode : 1 (pid: 2636555) error_file: /tmp/torchelastic_2_o51g2h/none_yf3ylyhy/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/41.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam32-ib0 rank : 152 (local_rank: 0) exitcode : 1 (pid: 511876) error_file: /tmp/torchelastic_blk3xnes/none_vbtq9eq2/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/59.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam41-ib0 rank : 224 (local_rank: 0) exitcode : 1 (pid: 2669194) error_file: /tmp/torchelastic_ogledwd8/none_jhr34_yn/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/68.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:58 host : jean-zay-iam46-ib0 rank : 267 (local_rank: 3) exitcode : 1 (pid: 3913407) error_file: /tmp/torchelastic_s350fq75/none_2nvwyvsf/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam38-ib0 rank : 201 (local_rank: 1) exitcode : 1 (pid: 3783300) error_file: /tmp/torchelastic_o3doiy0q/none_25p9izjn/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/26.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/58.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:58 host : jean-zay-iam19-ib0 rank : 105 (local_rank: 1) exitcode : 1 (pid: 1442003) error_file: /tmp/torchelastic_qcfis56r/none_shpq900c/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam44-ib0 rank : 249 (local_rank: 1) exitcode : 1 (pid: 1579370) error_file: /tmp/torchelastic_y7x6cenb/none_zftq9jj5/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/40.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam33-ib0 rank : 161 (local_rank: 1) exitcode : 1 (pid: 369546) error_file: /tmp/torchelastic_opu99phk/none_uad8_372/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/68.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:58 host : jean-zay-iam46-ib0 rank : 268 (local_rank: 4) exitcode : 1 (pid: 3913408) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint error_file: /tmp/torchelastic_s350fq75/none_2nvwyvsf/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/69.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:58 host : jean-zay-iam46-ib0 rank : 269 (local_rank: 5) exitcode : 1 (pid: 3913409) assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/52.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam38-ib0 rank : 202 (local_rank: 2) exitcode : 1 (pid: 3783301) error_file: /tmp/torchelastic_o3doiy0q/none_25p9izjn/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state error_file: /tmp/torchelastic_s350fq75/none_2nvwyvsf/attempt_0/5/error.json traceback : Traceback (most recent call last): assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/28.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:58 host : jean-zay-iam19-ib0 rank : 106 (local_rank: 2) exitcode : 1 (pid: 1442004) error_file: /tmp/torchelastic_qcfis56r/none_shpq900c/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/42.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam33-ib0 rank : 162 (local_rank: 2) exitcode : 1 (pid: 369547) error_file: /tmp/torchelastic_opu99phk/none_uad8_372/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/64.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam44-ib0 rank : 250 (local_rank: 2) exitcode : 1 (pid: 1579371) error_file: /tmp/torchelastic_y7x6cenb/none_zftq9jj5/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/69.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:58 host : jean-zay-iam46-ib0 rank : 270 (local_rank: 6) exitcode : 1 (pid: 3913410) error_file: /tmp/torchelastic_s350fq75/none_2nvwyvsf/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/52.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam38-ib0 rank : 203 (local_rank: 3) exitcode : 1 (pid: 3783302) error_file: /tmp/torchelastic_o3doiy0q/none_25p9izjn/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/64.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam44-ib0 rank : 251 (local_rank: 3) exitcode : 1 (pid: 1579372) error_file: /tmp/torchelastic_y7x6cenb/none_zftq9jj5/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/69.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:58 host : jean-zay-iam46-ib0 rank : 271 (local_rank: 7) exitcode : 1 (pid: 3913411) error_file: /tmp/torchelastic_s350fq75/none_2nvwyvsf/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/28.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:58 host : jean-zay-iam19-ib0 rank : 107 (local_rank: 3) exitcode : 1 (pid: 1442005) error_file: /tmp/torchelastic_qcfis56r/none_shpq900c/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/42.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam33-ib0 rank : 163 (local_rank: 3) exitcode : 1 (pid: 369548) error_file: /tmp/torchelastic_opu99phk/none_uad8_372/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/52.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam38-ib0 rank : 204 (local_rank: 4) exitcode : 1 (pid: 3783303) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, raise ChildFailedError( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, error_file: /tmp/torchelastic_o3doiy0q/none_25p9izjn/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/64.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam44-ib0 rank : 252 (local_rank: 4) exitcode : 1 (pid: 1579373) assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/69.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:58 host : jean-zay-iam46-ib0 rank : 264 (local_rank: 0) exitcode : 1 (pid: 3913404) error_file: /tmp/torchelastic_s350fq75/none_2nvwyvsf/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/28.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:58 host : jean-zay-iam19-ib0 rank : 108 (local_rank: 4) exitcode : 1 (pid: 1442006) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/42.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam33-ib0 rank : 164 (local_rank: 4) exitcode : 1 (pid: 369549) error_file: /tmp/torchelastic_y7x6cenb/none_zftq9jj5/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint error_file: /tmp/torchelastic_qcfis56r/none_shpq900c/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/53.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam38-ib0 rank : 205 (local_rank: 5) exitcode : 1 (pid: 3783304) error_file: /tmp/torchelastic_opu99phk/none_uad8_372/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, error_file: /tmp/torchelastic_o3doiy0q/none_25p9izjn/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/65.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam44-ib0 rank : 253 (local_rank: 5) exitcode : 1 (pid: 1579374) assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/68.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/29.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:58 host : jean-zay-iam19-ib0 rank : 109 (local_rank: 5) exitcode : 1 (pid: 1442007) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/43.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam33-ib0 rank : 165 (local_rank: 5) exitcode : 1 (pid: 369550) error_file: /tmp/torchelastic_y7x6cenb/none_zftq9jj5/attempt_0/5/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_qcfis56r/none_shpq900c/attempt_0/5/error.json traceback : Traceback (most recent call last): loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) error_file: /tmp/torchelastic_opu99phk/none_uad8_372/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam09-ib0 rank : 57 (local_rank: 1) exitcode : 1 (pid: 2016062) error_file: /tmp/torchelastic_1exp1xob/none_n9gnwva8/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/53.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam38-ib0 rank : 206 (local_rank: 6) exitcode : 1 (pid: 3783305) error_file: /tmp/torchelastic_o3doiy0q/none_25p9izjn/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/65.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam44-ib0 rank : 254 (local_rank: 6) exitcode : 1 (pid: 1579375) error_file: /tmp/torchelastic_y7x6cenb/none_zftq9jj5/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/29.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:58 host : jean-zay-iam19-ib0 rank : 110 (local_rank: 6) exitcode : 1 (pid: 1442008) error_file: /tmp/torchelastic_qcfis56r/none_shpq900c/attempt_0/6/error.json traceback : Traceback (most recent call last): loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/43.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam33-ib0 rank : 166 (local_rank: 6) exitcode : 1 (pid: 369551) error_file: /tmp/torchelastic_opu99phk/none_uad8_372/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/53.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam38-ib0 rank : 207 (local_rank: 7) exitcode : 1 (pid: 3783306) error_file: /tmp/torchelastic_o3doiy0q/none_25p9izjn/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/65.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam44-ib0 rank : 255 (local_rank: 7) exitcode : 1 (pid: 1579376) error_file: /tmp/torchelastic_y7x6cenb/none_zftq9jj5/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/29.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:58 host : jean-zay-iam19-ib0 rank : 111 (local_rank: 7) exitcode : 1 (pid: 1442009) error_file: /tmp/torchelastic_qcfis56r/none_shpq900c/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/43.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam33-ib0 rank : 167 (local_rank: 7) exitcode : 1 (pid: 369552) error_file: /tmp/torchelastic_opu99phk/none_uad8_372/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/53.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam38-ib0 rank : 200 (local_rank: 0) exitcode : 1 (pid: 3783299) error_file: /tmp/torchelastic_o3doiy0q/none_25p9izjn/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/16.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam09-ib0 rank : 58 (local_rank: 2) exitcode : 1 (pid: 2016063) error_file: /tmp/torchelastic_1exp1xob/none_n9gnwva8/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/65.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam44-ib0 rank : 248 (local_rank: 0) exitcode : 1 (pid: 1579369) error_file: /tmp/torchelastic_y7x6cenb/none_zftq9jj5/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/29.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:58 host : jean-zay-iam19-ib0 rank : 104 (local_rank: 0) exitcode : 1 (pid: 1442002) error_file: /tmp/torchelastic_qcfis56r/none_shpq900c/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/43.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam33-ib0 rank : 160 (local_rank: 0) exitcode : 1 (pid: 369545) error_file: /tmp/torchelastic_opu99phk/none_uad8_372/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint raise ChildFailedError( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/52.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/64.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/28.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/42.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/16.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam09-ib0 rank : 59 (local_rank: 3) exitcode : 1 (pid: 2016064) error_file: /tmp/torchelastic_1exp1xob/none_n9gnwva8/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam14-ib0 rank : 81 (local_rank: 1) exitcode : 1 (pid: 2227031) error_file: /tmp/torchelastic_xhmqo4pm/none_o0zt5__e/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:58 host : jean-zay-iam05-ib0 rank : 25 (local_rank: 1) exitcode : 1 (pid: 3019359) error_file: /tmp/torchelastic_xob3wfw6/none_apszoa9t/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/16.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam09-ib0 rank : 60 (local_rank: 4) exitcode : 1 (pid: 2016065) error_file: /tmp/torchelastic_1exp1xob/none_n9gnwva8/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/17.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam09-ib0 rank : 61 (local_rank: 5) exitcode : 1 (pid: 2016066) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint error_file: /tmp/torchelastic_1exp1xob/none_n9gnwva8/attempt_0/5/error.json traceback : Traceback (most recent call last): success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/17.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam09-ib0 rank : 62 (local_rank: 6) exitcode : 1 (pid: 2016067) error_file: /tmp/torchelastic_1exp1xob/none_n9gnwva8/attempt_0/6/error.json traceback : Traceback (most recent call last): assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/22.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam14-ib0 rank : 82 (local_rank: 2) exitcode : 1 (pid: 2227032) error_file: /tmp/torchelastic_xhmqo4pm/none_o0zt5__e/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/8.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:58 host : jean-zay-iam05-ib0 rank : 26 (local_rank: 2) exitcode : 1 (pid: 3019360) error_file: /tmp/torchelastic_xob3wfw6/none_apszoa9t/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/17.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam09-ib0 rank : 63 (local_rank: 7) exitcode : 1 (pid: 2016068) error_file: /tmp/torchelastic_1exp1xob/none_n9gnwva8/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/17.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam09-ib0 rank : 56 (local_rank: 0) exitcode : 1 (pid: 2016061) error_file: /tmp/torchelastic_1exp1xob/none_n9gnwva8/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/22.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam14-ib0 rank : 83 (local_rank: 3) exitcode : 1 (pid: 2227033) error_file: /tmp/torchelastic_xhmqo4pm/none_o0zt5__e/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/8.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:58 host : jean-zay-iam05-ib0 rank : 27 (local_rank: 3) exitcode : 1 (pid: 3019361) error_file: /tmp/torchelastic_xob3wfw6/none_apszoa9t/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/16.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/22.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam14-ib0 rank : 84 (local_rank: 4) exitcode : 1 (pid: 2227034) model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( error_file: /tmp/torchelastic_xhmqo4pm/none_o0zt5__e/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/8.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:58 host : jean-zay-iam05-ib0 rank : 28 (local_rank: 4) exitcode : 1 (pid: 3019362) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/23.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam14-ib0 rank : 85 (local_rank: 5) exitcode : 1 (pid: 2227035) error_file: /tmp/torchelastic_xob3wfw6/none_apszoa9t/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) error_file: /tmp/torchelastic_xhmqo4pm/none_o0zt5__e/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/9.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:58 host : jean-zay-iam05-ib0 rank : 29 (local_rank: 5) exitcode : 1 (pid: 3019363) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) error_file: /tmp/torchelastic_xob3wfw6/none_apszoa9t/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/23.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam14-ib0 rank : 86 (local_rank: 6) exitcode : 1 (pid: 2227036) error_file: /tmp/torchelastic_xhmqo4pm/none_o0zt5__e/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/9.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:58 host : jean-zay-iam05-ib0 rank : 30 (local_rank: 6) exitcode : 1 (pid: 3019364) error_file: /tmp/torchelastic_xob3wfw6/none_apszoa9t/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/23.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam14-ib0 rank : 87 (local_rank: 7) exitcode : 1 (pid: 2227037) error_file: /tmp/torchelastic_xhmqo4pm/none_o0zt5__e/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/9.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:58 host : jean-zay-iam05-ib0 rank : 31 (local_rank: 7) exitcode : 1 (pid: 3019365) error_file: /tmp/torchelastic_xob3wfw6/none_apszoa9t/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/23.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam14-ib0 rank : 80 (local_rank: 0) exitcode : 1 (pid: 2227030) error_file: /tmp/torchelastic_xhmqo4pm/none_o0zt5__e/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/9.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:58 host : jean-zay-iam05-ib0 rank : 24 (local_rank: 0) exitcode : 1 (pid: 3019358) error_file: /tmp/torchelastic_xob3wfw6/none_apszoa9t/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/22.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/8.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:59 host : jean-zay-iam15-ib0 rank : 89 (local_rank: 1) exitcode : 1 (pid: 2133737) error_file: /tmp/torchelastic_8y6h7599/none_0io_3k0o/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/24.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:59 host : jean-zay-iam15-ib0 rank : 90 (local_rank: 2) exitcode : 1 (pid: 2133738) error_file: /tmp/torchelastic_8y6h7599/none_0io_3k0o/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/24.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:59 host : jean-zay-iam15-ib0 rank : 91 (local_rank: 3) exitcode : 1 (pid: 2133739) error_file: /tmp/torchelastic_8y6h7599/none_0io_3k0o/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/24.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:59 host : jean-zay-iam15-ib0 rank : 92 (local_rank: 4) exitcode : 1 (pid: 2133740) return launch_agent(self._config, self._entrypoint, list(args)) error_file: /tmp/torchelastic_8y6h7599/none_0io_3k0o/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/25.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:59 host : jean-zay-iam15-ib0 rank : 93 (local_rank: 5) exitcode : 1 (pid: 2133741) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent error_file: /tmp/torchelastic_8y6h7599/none_0io_3k0o/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/25.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:59 host : jean-zay-iam15-ib0 rank : 94 (local_rank: 6) exitcode : 1 (pid: 2133742) error_file: /tmp/torchelastic_8y6h7599/none_0io_3k0o/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/25.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam15-ib0 rank : 95 (local_rank: 7) exitcode : 1 (pid: 2133743) error_file: /tmp/torchelastic_8y6h7599/none_0io_3k0o/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state raise ChildFailedError( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/25.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam15-ib0 rank : 88 (local_rank: 0) exitcode : 1 (pid: 2133736) error_file: /tmp/torchelastic_8y6h7599/none_0io_3k0o/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/24.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:58 host : jean-zay-iam34-ib0 rank : 169 (local_rank: 1) exitcode : 1 (pid: 1713688) error_file: /tmp/torchelastic_dou38qq4/none_gahv3zf4/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:48:58 host : jean-zay-iam30-ib0 rank : 137 (local_rank: 1) exitcode : 1 (pid: 3591589) error_file: /tmp/torchelastic_swf7fghm/none_r4_vsl7h/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/44.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:58 host : jean-zay-iam34-ib0 rank : 170 (local_rank: 2) exitcode : 1 (pid: 1713689) error_file: /tmp/torchelastic_dou38qq4/none_gahv3zf4/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/44.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:58 host : jean-zay-iam34-ib0 rank : 171 (local_rank: 3) exitcode : 1 (pid: 1713690) error_file: /tmp/torchelastic_dou38qq4/none_gahv3zf4/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/44.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:58 host : jean-zay-iam34-ib0 rank : 172 (local_rank: 4) exitcode : 1 (pid: 1713691) error_file: /tmp/torchelastic_dou38qq4/none_gahv3zf4/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/45.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:58 host : jean-zay-iam34-ib0 rank : 173 (local_rank: 5) exitcode : 1 (pid: 1713692) error_file: /tmp/torchelastic_dou38qq4/none_gahv3zf4/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/45.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:58 host : jean-zay-iam34-ib0 rank : 174 (local_rank: 6) exitcode : 1 (pid: 1713693) error_file: /tmp/torchelastic_dou38qq4/none_gahv3zf4/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/45.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:58 host : jean-zay-iam34-ib0 rank : 175 (local_rank: 7) exitcode : 1 (pid: 1713694) error_file: /tmp/torchelastic_dou38qq4/none_gahv3zf4/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/36.self_attention.query_key_value.weight/fp32.pt is not a valid file [2]: time : 2022-09-03_18:48:58 host : jean-zay-iam30-ib0 rank : 138 (local_rank: 2) exitcode : 1 (pid: 3591590) error_file: /tmp/torchelastic_swf7fghm/none_r4_vsl7h/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/45.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:58 host : jean-zay-iam34-ib0 rank : 168 (local_rank: 0) exitcode : 1 (pid: 1713687) error_file: /tmp/torchelastic_dou38qq4/none_gahv3zf4/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/44.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/36.self_attention.query_key_value.weight/fp32.pt is not a valid file [3]: time : 2022-09-03_18:48:58 host : jean-zay-iam30-ib0 rank : 139 (local_rank: 3) exitcode : 1 (pid: 3591591) error_file: /tmp/torchelastic_swf7fghm/none_r4_vsl7h/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/36.self_attention.dense.weight/fp32.pt is not a valid file [4]: time : 2022-09-03_18:48:58 host : jean-zay-iam30-ib0 rank : 140 (local_rank: 4) exitcode : 1 (pid: 3591592) error_file: /tmp/torchelastic_swf7fghm/none_r4_vsl7h/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/37.self_attention.query_key_value.weight/fp32.pt is not a valid file [5]: time : 2022-09-03_18:48:58 host : jean-zay-iam30-ib0 rank : 141 (local_rank: 5) exitcode : 1 (pid: 3591593) error_file: /tmp/torchelastic_swf7fghm/none_r4_vsl7h/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/37.self_attention.query_key_value.weight/fp32.pt is not a valid file [6]: time : 2022-09-03_18:48:58 host : jean-zay-iam30-ib0 rank : 142 (local_rank: 6) exitcode : 1 (pid: 3591594) error_file: /tmp/torchelastic_swf7fghm/none_r4_vsl7h/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/37.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:58 host : jean-zay-iam30-ib0 rank : 143 (local_rank: 7) exitcode : 1 (pid: 3591595) error_file: /tmp/torchelastic_swf7fghm/none_r4_vsl7h/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/37.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:58 host : jean-zay-iam30-ib0 rank : 136 (local_rank: 0) exitcode : 1 (pid: 3591588) error_file: /tmp/torchelastic_swf7fghm/none_r4_vsl7h/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/36.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/61.self_attention.query_key_value.weight/fp32.pt is not a valid file [7]: time : 2022-09-03_18:48:59 host : jean-zay-iam42-ib0 rank : 239 (local_rank: 7) exitcode : 1 (pid: 3039517) error_file: /tmp/torchelastic_5wkah3wt/none_a9rt74rx/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/61.self_attention.dense.weight/fp32.pt is not a valid file ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:48:59 host : jean-zay-iam42-ib0 rank : 232 (local_rank: 0) exitcode : 1 (pid: 3039510) error_file: /tmp/torchelastic_5wkah3wt/none_a9rt74rx/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 141, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 452, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/checkpointing.py", line 278, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir, load_module_only=not load_optimizer_states, load_optimizer_states=load_optimizer_states, load_lr_scheduler_states=load_optimizer_states) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2584, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 606, in load_state_dict self._load_universal_checkpoint(checkpoint_folder, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 643, in _load_universal_checkpoint self._load_hp_checkpoint_state(checkpoint_folder) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 659, in _load_hp_checkpoint_state lp.load_hp_checkpoint_state( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/bf16_optimizer.py", line 96, in load_hp_checkpoint_state assert os.path.isfile(file), f'{file} is not a valid file' AssertionError: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0/zero/60.self_attention.query_key_value.weight/fp32.pt is not a valid file ============================================================ srun: error: jean-zay-iam44: task 31: Exited with exit code 1 srun: launch/slurm: _step_signal: Terminating StepId=927259.0 srun: error: jean-zay-iam06: task 4: Exited with exit code 1 srun: error: jean-zay-iam27: task 15: Exited with exit code 1 srun: error: jean-zay-iam18: task 12: Exited with exit code 1 srun: error: jean-zay-iam40: task 27: Exited with exit code 1 srun: error: jean-zay-iam45: task 32: Exited with exit code 1 srun: error: jean-zay-iam46: task 33: Exited with exit code 1 srun: error: jean-zay-iam38: task 25: Exited with exit code 1 srun: error: jean-zay-iam19: task 13: Exited with exit code 1 srun: error: jean-zay-iam04: task 2: Exited with exit code 1 srun: error: jean-zay-iam42: task 29: Exited with exit code 1 srun: error: jean-zay-iam35: task 22: Exited with exit code 1 srun: error: jean-zay-iam47: task 34: Exited with exit code 1 srun: error: jean-zay-iam37: task 24: Exited with exit code 1 srun: error: jean-zay-iam41: task 28: Exited with exit code 1 srun: error: jean-zay-iam26: task 14: Exited with exit code 1 srun: error: jean-zay-iam13: task 9: Exited with exit code 1 srun: error: jean-zay-iam39: task 26: Exited with exit code 1 srun: error: jean-zay-iam11: task 8: Exited with exit code 1 srun: error: jean-zay-iam15: task 11: Exited with exit code 1 srun: error: jean-zay-iam09: task 7: Exited with exit code 1 srun: error: jean-zay-iam02: task 0: Exited with exit code 1 srun: error: jean-zay-iam08: task 6: Exited with exit code 1 srun: error: jean-zay-iam07: task 5: Exited with exit code 1 srun: error: jean-zay-iam33: task 20: Exited with exit code 1 srun: error: jean-zay-iam43: task 30: Exited with exit code 1 srun: error: jean-zay-iam03: task 1: Exited with exit code 1 srun: error: jean-zay-iam28: task 16: Exited with exit code 1 srun: error: jean-zay-iam05: task 3: Exited with exit code 1 srun: error: jean-zay-iam32: task 19: Exited with exit code 1 srun: error: jean-zay-iam52: task 35: Exited with exit code 1 srun: error: jean-zay-iam34: task 21: Exited with exit code 1 srun: error: jean-zay-iam30: task 17: Exited with exit code 1 srun: error: jean-zay-iam14: task 10: Exited with exit code 1 srun: error: jean-zay-iam36: task 23: Exited with exit code 1 srun: error: jean-zay-iam31: task 18: Exited with exit code 1 WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** [default0]:using world size: 288, data-parallel-size: 4, tensor-model-parallel size: 1, pipeline-model-parallel size: 72 [default0]:accumulate and all-reduce gradients in fp32 for bfloat16 data type. [default0]:using torch.bfloat16 for parameters ... [default0]:------------------------ arguments ------------------------ [default0]: abort_on_unmet_fused_kernel_constraints ......... True [default0]: accumulate_allreduce_grads_in_fp32 .............. True [default0]: adam_beta1 ...................................... 0.9 [default0]: adam_beta2 ...................................... 0.95 [default0]: adam_eps ........................................ 1e-08 [default0]: adlr_autoresume ................................. False [default0]: adlr_autoresume_interval ........................ 1000 [default0]: apply_query_key_layer_scaling ................... True [default0]: apply_residual_connection_post_layernorm ........ False [default0]: attention_dropout ............................... 0.1 [default0]: attention_softmax_in_fp32 ....................... False [default0]: bert_binary_head ................................ True [default0]: bert_load ....................................... None [default0]: bf16 ............................................ True [default0]: bias_dropout_fusion ............................. True [default0]: bias_gelu_fusion ................................ True [default0]: biencoder_projection_dim ........................ 0 [default0]: biencoder_shared_query_context_model ............ False [default0]: block_data_path ................................. None [default0]: checkpoint_activations .......................... True [default0]: checkpoint_in_cpu ............................... False [default0]: checkpoint_num_layers ........................... 1 [default0]: clip_grad ....................................... 1.0 [default0]: codecarbon_dir .................................. None [default0]: consumed_train_samples .......................... 0 [default0]: consumed_train_tokens ........................... 0 [default0]: consumed_valid_samples .......................... 0 [default0]: contigious_checkpointing ........................ False [default0]: cpu_optimizer ................................... False [default0]: cpu_torch_adam .................................. False [default0]: curriculum_learning ............................. False [default0]: data_impl ....................................... mmap [default0]: data_parallel_size .............................. 4 [default0]: data_path ....................................... None [default0]: dataloader_type ................................. single [default0]: DDP_impl ........................................ local [default0]: decoder_seq_length .............................. None [default0]: deepscale ....................................... False [default0]: deepscale_config ................................ None [default0]: deepspeed ....................................... True [default0]: deepspeed_activation_checkpointing .............. True [default0]: deepspeed_config ................................ ./ds_config.927263.json [default0]: deepspeed_mpi ................................... False [default0]: distribute_checkpointed_activations ............. False [default0]: distributed_backend ............................. nccl [default0]: embed_layernorm ................................. True [default0]: embedding_path .................................. None [default0]: encoder_seq_length .............................. 2048 [default0]: eod_mask_loss ................................... False [default0]: eval_interval ................................... 250 [default0]: eval_iters ...................................... 5 [default0]: eval_only ....................................... None [default0]: evidence_data_path .............................. None [default0]: exit_duration_in_mins ........................... 5990 [default0]: exit_interval ................................... None [default0]: ffn_hidden_size ................................. 57344 [default0]: finetune ........................................ False [default0]: fp16 ............................................ False [default0]: fp16_lm_cross_entropy ........................... False [default0]: fp32_residual_connection ........................ False [default0]: gigaflos_no_embeds .............................. 0 [default0]: global_batch_size ............................... 2048 [default0]: glu_activation .................................. None [default0]: hidden_dropout .................................. 0.1 [default0]: hidden_size ..................................... 14336 [default0]: hysteresis ...................................... 2 [default0]: ict_head_size ................................... None [default0]: ict_load ........................................ None [default0]: img_dim ......................................... 224 [default0]: indexer_batch_size .............................. 128 [default0]: indexer_log_interval ............................ 1000 [default0]: inference ....................................... False [default0]: init_method_std ................................. 0.0048 [default0]: init_method_xavier_uniform ...................... False [default0]: initial_loss_scale .............................. 4294967296 [default0]: kill_switch_path ................................ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/kill-switch-tr13-176B-mtf [default0]: kv_channels ..................................... 128 [default0]: layernorm_epsilon ............................... 1e-05 [default0]: lazy_mpu_init ................................... None [default0]: load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: local_rank ...................................... None [default0]: log_batch_size_to_tensorboard ................... True [default0]: log_interval .................................... 1 [default0]: log_learning_rate_to_tensorboard ................ True [default0]: log_level ....................................... None [default0]: log_level_replica ............................... None [default0]: log_loss_scale_to_tensorboard ................... True [default0]: log_num_zeros_in_grad ........................... False [default0]: log_params_norm ................................. False [default0]: log_path ........................................ None [default0]: log_timers_to_tensorboard ....................... True [default0]: log_validation_ppl_to_tensorboard ............... True [default0]: loss_on_targets_only ............................ False [default0]: loss_scale ...................................... None [default0]: loss_scale_window ............................... 1000 [default0]: lr .............................................. 2e-05 [default0]: lr_decay_iters .................................. None [default0]: lr_decay_samples ................................ None [default0]: lr_decay_style .................................. constant [default0]: lr_decay_tokens ................................. None [default0]: lr_warmup_fraction .............................. None [default0]: lr_warmup_iters ................................. 0 [default0]: lr_warmup_samples ............................... 0 [default0]: make_vocab_size_divisible_by .................... 128 [default0]: mask_prob ....................................... 0.15 [default0]: masked_softmax_fusion ........................... True [default0]: max_position_embeddings ......................... 2048 [default0]: mean_noise_span_length .......................... None [default0]: memory_centric_tiled_linear ..................... False [default0]: merge_file ...................................... None [default0]: micro_batch_size ................................ 1 [default0]: min_loss_scale .................................. 1.0 [default0]: min_lr .......................................... 0.0 [default0]: mmap_warmup ..................................... False [default0]: no_load_optim ................................... True [default0]: no_load_rng ..................................... None [default0]: no_save_optim ................................... None [default0]: no_save_rng ..................................... None [default0]: noise_density ................................... None [default0]: norm_target_loss ................................ True [default0]: num_attention_heads ............................. 112 [default0]: num_channels .................................... 3 [default0]: num_classes ..................................... 1000 [default0]: num_layers ...................................... 70 [default0]: num_layers_per_virtual_pipeline_stage ........... None [default0]: num_workers ..................................... 2 [default0]: onnx_safe ....................................... None [default0]: openai_gelu ..................................... False [default0]: optimizer ....................................... adam [default0]: override_lr_scheduler ........................... False [default0]: pad_vocab_size_to ............................... 250880 [default0]: params_dtype .................................... torch.bfloat16 [default0]: partition_activations ........................... False [default0]: patch_dim ....................................... 16 [default0]: pipeline_model_parallel_size .................... 72 [default0]: position_embedding_type ......................... PositionEmbeddingType.alibi [default0]: pp_partition_method ............................. type:transformer|embedding [default0]: prefixlm ........................................ False [default0]: profile_backward ................................ False [default0]: query_in_block_prob ............................. 0.1 [default0]: rampup_batch_size ............................... None [default0]: rank ............................................ 0 [default0]: remote_device ................................... none [default0]: reset_attention_mask ............................ False [default0]: reset_position_ids .............................. False [default0]: reset_progress .................................. True [default0]: retriever_report_topk_accuracies ................ [] [default0]: retriever_score_scaling ......................... False [default0]: retriever_seq_length ............................ 256 [default0]: reweight_loss_based_on_position_frequency ....... False [default0]: sample_rate ..................................... 1.0 [default0]: save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: save_interval ................................... 5 [default0]: scatter_gather_tensors_in_pipeline .............. True [default0]: scattered_embeddings ............................ False [default0]: seed ............................................ 42 [default0]: seq_length ...................................... 2048 [default0]: sgd_momentum .................................... 0.9 [default0]: short_seq_prob .................................. 0.1 [default0]: skip_train_iteration_range ...................... None [default0]: split ........................................... None [default0]: split_transformers .............................. False [default0]: sync_tp_duplicated_parameters ................... True [default0]: synchronize_each_layer .......................... False [default0]: tensor_model_parallel_size ...................... 1 [default0]: tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/tr13-176B-ml-t0-logs/tensorboard/p31lossseq [default0]: tensorboard_log_interval ........................ 1 [default0]: tensorboard_queue_size .......................... 5 [default0]: test_weighted_split_paths ....................... None [default0]: test_weighted_split_paths_path .................. None [default0]: tile_factor ..................................... 1 [default0]: titles_data_path ................................ None [default0]: tokenizer_name_or_path .......................... bigscience/tokenizer [default0]: tokenizer_type .................................. PretrainedFromHF [default0]: train_iters ..................................... None [default0]: train_samples ................................... 6348800 [default0]: train_tokens .................................... None [default0]: train_weighted_split_names ...................... ['train'] [default0]: train_weighted_split_paths ...................... [['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train']] [default0]: train_weighted_split_paths_path ................. None [default0]: train_weighted_split_splits ..................... [['0:1']] [default0]: train_weighted_split_weights .................... [['1']] [default0]: universal_checkpoint ............................ True [default0]: use_bnb_optimizer ............................... False [default0]: use_checkpoint_lr_scheduler ..................... False [default0]: use_contiguous_buffers_in_ddp ................... True [default0]: use_cpu_initialization .......................... None [default0]: use_one_sent_docs ............................... False [default0]: use_pin_memory .................................. False [default0]: valid_num_workers ............................... 2 [default0]: valid_weighted_split_names ...................... ['validation_pretraining', 'valid_ar', 'valid_ca', 'valid_code', 'valid_en', 'valid_es', 'valid_eu', 'valid_fr', 'valid_id', 'valid_indic-as', 'valid_indic-bn', 'valid_indic-gu', 'valid_indic-hi', 'valid_indic-kn', 'valid_indic-ml', 'valid_indic-mr', 'valid_indic-ne', 'valid_indic-or', 'valid_indic-pa', 'valid_indic-ta', 'valid_indic-te', 'valid_indic-ur', 'valid_nigercongo-all', 'valid_oscar-en', 'valid_oscar-zh', 'valid_pt', 'valid_vi', 'valid_zhs', 'valid_zht', 'valid'] [default0]: valid_weighted_split_paths ...................... [['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation']] [default0]: valid_weighted_split_paths_path ................. None [default0]: valid_weighted_split_splits ..................... [['0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0:1']] [default0]: valid_weighted_split_weights .................... [['0.0330676168743166', '0.011242051312222764', '0.13027200903379185', '0.22171164529099704', '0.10667815627928671', '0.0015595123898173287', '0.13054018439603915', '0.01091803753667153', '0.00011021422347108609', '0.005492381453597748', '0.0004021215011318779', '0.007470068593492175', '0.0006190467776576425', '0.0010335296343329384', '0.0005012010684646179', '0.0006672772956128299', '0.00035928138344705506', '0.0005084433130291778', '0.0021137328219915496', '0.0009129946225980253', '0.0012454301613725426', '0.00031588689199263235', '0.08137213783015229', '0.055293935695898196', '0.04954150576361177', '0.02461641286531197', '0.12091748245519074', '0.0005177025345001541'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1']] [default0]: virtual_pipeline_model_parallel_size ............ None [default0]: vocab_extra_ids ................................. 0 [default0]: vocab_file ...................................... None [default0]: weight_decay .................................... 0.0001 [default0]: world_size ...................................... 288 [default0]: zero_allgather_bucket_size ...................... 0.0 [default0]: zero_contigious_gradients ....................... False [default0]: zero_reduce_bucket_size ......................... 0.0 [default0]: zero_reduce_scatter ............................. False [default0]: zero_stage ...................................... 0 [default0]:-------------------- end of arguments --------------------- [default0]:setting number of micro-batches to constant 512 [default0]:> building PretrainedFromHF tokenizer ... [default0]: vocab file is un-used. loading tokenizer from pre-trained model [default0]:Offline mode: forcing local_files_only=True [default0]:Offline mode: forcing local_files_only=True [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer.json from cache at /gpfswork/rech/six/commun/models/29d0a41f4527257b8afe6d5495f492dac260318430f18239a42ca5f6dc4487fc.7b0fb8edc2986944ff9b7418149b52d8c4a1354a17d0360deb8974da70c6cc03 [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/added_tokens.json from cache at None [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/special_tokens_map.json from cache at /gpfswork/rech/six/commun/models/4f03e43bcc54e0721823e6a06b1d197905e2ea79aa7dcc1a0f0fcecc73ce3fb2.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer_config.json from cache at /gpfswork/rech/six/commun/models/9441c67b923ef7a65950a64e31c40f80ed181ba59502981a80f2cd0c438c6432.3c09887250243e50d8de9d10b2a778152434f62a22a95b5f89dbbe79a6eb496a [default0]: > padded vocab (size: 250680) with 200 dummy tokens (new size: 250880) [default0]:DeepSpeed general environment info: [default0]:torch install path ............... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch'] [default0]:torch version .................... 1.12.0 [default0]:torch cuda version ............... 11.3 [default0]:torch hip version ................ None [default0]:nvcc version ..................... 11.4 [default0]:deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed'] [default0]:deepspeed info ................... 0.7.1+8b2a6371, 8b2a6371, master [default0]:deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3 [default0]:**** Git info for Megatron: git_hash=6c1018f git_branch=mtf-multival **** [default0]:> initializing torch distributed ... [default0]:[2022-09-03 18:56:05,285] [INFO] [comm.py:628:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [default0]:> initializing tensor model parallel with size 1 [default0]:> initializing pipeline model parallel with size 72 [default7]:> setting tensorboard ... [default0]:> setting random seeds to 42 ... [default0]:[2022-09-03 18:56:11,773] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 [default0]:> compiling dataset index builder ... [default0]:make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:make: Nothing to be done for 'default'. [default0]:make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:>>> done with dataset index builder. Compilation time: 0.093 seconds [default0]:> compiling and loading fused kernels ... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module fused_mix_prec_layer_norm_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module fused_mix_prec_layer_norm_cuda... [default0]:>>> done with compiling and loading fused kernels. Compilation time: 6.766 seconds [default0]:time to initialize megatron (seconds): 50.634 [default0]:[after megatron is initialized] datetime: 2022-09-03 18:56:18 [default0]:building GPT model ... [default0]:[2022-09-03 18:56:18,678] [INFO] [utils.py:827:see_memory_usage] Before Building Model [default0]:[2022-09-03 18:56:18,679] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [default0]:[2022-09-03 18:56:18,679] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 35.97 GB, percent = 7.1% [default0]:SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None [default0]:Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=1, model=0): 5, ProcessCoord(pipe=1, data=2, model=0): 6, ProcessCoord(pipe=1, data=3, model=0): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=1, model=0): 9, ProcessCoord(pipe=2, data=2, model=0): 10, ProcessCoord(pipe=2, data=3, model=0): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=1, model=0): 13, ProcessCoord(pipe=3, data=2, model=0): 14, ProcessCoord(pipe=3, data=3, model=0): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=1, model=0): 17, ProcessCoord(pipe=4, data=2, model=0): 18, ProcessCoord(pipe=4, data=3, model=0): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=1, model=0): 21, ProcessCoord(pipe=5, data=2, model=0): 22, ProcessCoord(pipe=5, data=3, model=0): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=1, model=0): 25, ProcessCoord(pipe=6, data=2, model=0): 26, ProcessCoord(pipe=6, data=3, model=0): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=1, model=0): 29, ProcessCoord(pipe=7, data=2, model=0): 30, ProcessCoord(pipe=7, data=3, model=0): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=1, model=0): 33, ProcessCoord(pipe=8, data=2, model=0): 34, ProcessCoord(pipe=8, data=3, model=0): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=1, model=0): 37, ProcessCoord(pipe=9, data=2, model=0): 38, ProcessCoord(pipe=9, data=3, model=0): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=1, model=0): 41, ProcessCoord(pipe=10, data=2, model=0): 42, ProcessCoord(pipe=10, data=3, model=0): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=1, model=0): 45, ProcessCoord(pipe=11, data=2, model=0): 46, ProcessCoord(pipe=11, data=3, model=0): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=1, model=0): 49, ProcessCoord(pipe=12, data=2, model=0): 50, ProcessCoord(pipe=12, data=3, model=0): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=1, model=0): 53, ProcessCoord(pipe=13, data=2, model=0): 54, ProcessCoord(pipe=13, data=3, model=0): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=1, model=0): 57, ProcessCoord(pipe=14, data=2, model=0): 58, ProcessCoord(pipe=14, data=3, model=0): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=1, model=0): 61, ProcessCoord(pipe=15, data=2, model=0): 62, ProcessCoord(pipe=15, data=3, model=0): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=1, model=0): 65, ProcessCoord(pipe=16, data=2, model=0): 66, ProcessCoord(pipe=16, data=3, model=0): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=1, model=0): 69, ProcessCoord(pipe=17, data=2, model=0): 70, ProcessCoord(pipe=17, data=3, model=0): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=1, model=0): 73, ProcessCoord(pipe=18, data=2, model=0): 74, ProcessCoord(pipe=18, data=3, model=0): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=1, model=0): 77, ProcessCoord(pipe=19, data=2, model=0): 78, ProcessCoord(pipe=19, data=3, model=0): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=1, model=0): 81, ProcessCoord(pipe=20, data=2, model=0): 82, ProcessCoord(pipe=20, data=3, model=0): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=1, model=0): 85, ProcessCoord(pipe=21, data=2, model=0): 86, ProcessCoord(pipe=21, data=3, model=0): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=1, model=0): 89, ProcessCoord(pipe=22, data=2, model=0): 90, ProcessCoord(pipe=22, data=3, model=0): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=1, model=0): 93, ProcessCoord(pipe=23, data=2, model=0): 94, ProcessCoord(pipe=23, data=3, model=0): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=1, model=0): 97, ProcessCoord(pipe=24, data=2, model=0): 98, ProcessCoord(pipe=24, data=3, model=0): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=1, model=0): 101, ProcessCoord(pipe=25, data=2, model=0): 102, ProcessCoord(pipe=25, data=3, model=0): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=1, model=0): 105, ProcessCoord(pipe=26, data=2, model=0): 106, ProcessCoord(pipe=26, data=3, model=0): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=1, model=0): 109, ProcessCoord(pipe=27, data=2, model=0): 110, ProcessCoord(pipe=27, data=3, model=0): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=1, model=0): 113, ProcessCoord(pipe=28, data=2, model=0): 114, ProcessCoord(pipe=28, data=3, model=0): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=1, model=0): 117, ProcessCoord(pipe=29, data=2, model=0): 118, ProcessCoord(pipe=29, data=3, model=0): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=1, model=0): 121, ProcessCoord(pipe=30, data=2, model=0): 122, ProcessCoord(pipe=30, data=3, model=0): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=1, model=0): 125, ProcessCoord(pipe=31, data=2, model=0): 126, ProcessCoord(pipe=31, data=3, model=0): 127, ProcessCoord(pipe=32, data=0, model=0): 128, ProcessCoord(pipe=32, data=1, model=0): 129, ProcessCoord(pipe=32, data=2, model=0): 130, ProcessCoord(pipe=32, data=3, model=0): 131, ProcessCoord(pipe=33, data=0, model=0): 132, ProcessCoord(pipe=33, data=1, model=0): 133, ProcessCoord(pipe=33, data=2, model=0): 134, ProcessCoord(pipe=33, data=3, model=0): 135, ProcessCoord(pipe=34, data=0, model=0): 136, ProcessCoord(pipe=34, data=1, model=0): 137, ProcessCoord(pipe=34, data=2, model=0): 138, ProcessCoord(pipe=34, data=3, model=0): 139, ProcessCoord(pipe=35, data=0, model=0): 140, ProcessCoord(pipe=35, data=1, model=0): 141, ProcessCoord(pipe=35, data=2, model=0): 142, ProcessCoord(pipe=35, data=3, model=0): 143, ProcessCoord(pipe=36, data=0, model=0): 144, ProcessCoord(pipe=36, data=1, model=0): 145, ProcessCoord(pipe=36, data=2, model=0): 146, ProcessCoord(pipe=36, data=3, model=0): 147, ProcessCoord(pipe=37, data=0, model=0): 148, ProcessCoord(pipe=37, data=1, model=0): 149, ProcessCoord(pipe=37, data=2, model=0): 150, ProcessCoord(pipe=37, data=3, model=0): 151, ProcessCoord(pipe=38, data=0, model=0): 152, ProcessCoord(pipe=38, data=1, model=0): 153, ProcessCoord(pipe=38, data=2, model=0): 154, ProcessCoord(pipe=38, data=3, model=0): 155, ProcessCoord(pipe=39, data=0, model=0): 156, ProcessCoord(pipe=39, data=1, model=0): 157, ProcessCoord(pipe=39, data=2, model=0): 158, ProcessCoord(pipe=39, data=3, model=0): 159, ProcessCoord(pipe=40, data=0, model=0): 160, ProcessCoord(pipe=40, data=1, model=0): 161, ProcessCoord(pipe=40, data=2, model=0): 162, ProcessCoord(pipe=40, data=3, model=0): 163, ProcessCoord(pipe=41, data=0, model=0): 164, ProcessCoord(pipe=41, data=1, model=0): 165, ProcessCoord(pipe=41, data=2, model=0): 166, ProcessCoord(pipe=41, data=3, model=0): 167, ProcessCoord(pipe=42, data=0, model=0): 168, ProcessCoord(pipe=42, data=1, model=0): 169, ProcessCoord(pipe=42, data=2, model=0): 170, ProcessCoord(pipe=42, data=3, model=0): 171, ProcessCoord(pipe=43, data=0, model=0): 172, ProcessCoord(pipe=43, data=1, model=0): 173, ProcessCoord(pipe=43, data=2, model=0): 174, ProcessCoord(pipe=43, data=3, model=0): 175, ProcessCoord(pipe=44, data=0, model=0): 176, ProcessCoord(pipe=44, data=1, model=0): 177, ProcessCoord(pipe=44, data=2, model=0): 178, ProcessCoord(pipe=44, data=3, model=0): 179, ProcessCoord(pipe=45, data=0, model=0): 180, ProcessCoord(pipe=45, data=1, model=0): 181, ProcessCoord(pipe=45, data=2, model=0): 182, ProcessCoord(pipe=45, data=3, model=0): 183, ProcessCoord(pipe=46, data=0, model=0): 184, ProcessCoord(pipe=46, data=1, model=0): 185, ProcessCoord(pipe=46, data=2, model=0): 186, ProcessCoord(pipe=46, data=3, model=0): 187, ProcessCoord(pipe=47, data=0, model=0): 188, ProcessCoord(pipe=47, data=1, model=0): 189, ProcessCoord(pipe=47, data=2, model=0): 190, ProcessCoord(pipe=47, data=3, model=0): 191, ProcessCoord(pipe=48, data=0, model=0): 192, ProcessCoord(pipe=48, data=1, model=0): 193, ProcessCoord(pipe=48, data=2, model=0): 194, ProcessCoord(pipe=48, data=3, model=0): 195, ProcessCoord(pipe=49, data=0, model=0): 196, ProcessCoord(pipe=49, data=1, model=0): 197, ProcessCoord(pipe=49, data=2, model=0): 198, ProcessCoord(pipe=49, data=3, model=0): 199, ProcessCoord(pipe=50, data=0, model=0): 200, ProcessCoord(pipe=50, data=1, model=0): 201, ProcessCoord(pipe=50, data=2, model=0): 202, ProcessCoord(pipe=50, data=3, model=0): 203, ProcessCoord(pipe=51, data=0, model=0): 204, ProcessCoord(pipe=51, data=1, model=0): 205, ProcessCoord(pipe=51, data=2, model=0): 206, ProcessCoord(pipe=51, data=3, model=0): 207, ProcessCoord(pipe=52, data=0, model=0): 208, ProcessCoord(pipe=52, data=1, model=0): 209, ProcessCoord(pipe=52, data=2, model=0): 210, ProcessCoord(pipe=52, data=3, model=0): 211, ProcessCoord(pipe=53, data=0, model=0): 212, ProcessCoord(pipe=53, data=1, model=0): 213, ProcessCoord(pipe=53, data=2, model=0): 214, ProcessCoord(pipe=53, data=3, model=0): 215, ProcessCoord(pipe=54, data=0, model=0): 216, ProcessCoord(pipe=54, data=1, model=0): 217, ProcessCoord(pipe=54, data=2, model=0): 218, ProcessCoord(pipe=54, data=3, model=0): 219, ProcessCoord(pipe=55, data=0, model=0): 220, ProcessCoord(pipe=55, data=1, model=0): 221, ProcessCoord(pipe=55, data=2, model=0): 222, ProcessCoord(pipe=55, data=3, model=0): 223, ProcessCoord(pipe=56, data=0, model=0): 224, ProcessCoord(pipe=56, data=1, model=0): 225, ProcessCoord(pipe=56, data=2, model=0): 226, ProcessCoord(pipe=56, data=3, model=0): 227, ProcessCoord(pipe=57, data=0, model=0): 228, ProcessCoord(pipe=57, data=1, model=0): 229, ProcessCoord(pipe=57, data=2, model=0): 230, ProcessCoord(pipe=57, data=3, model=0): 231, ProcessCoord(pipe=58, data=0, model=0): 232, ProcessCoord(pipe=58, data=1, model=0): 233, ProcessCoord(pipe=58, data=2, model=0): 234, ProcessCoord(pipe=58, data=3, model=0): 235, ProcessCoord(pipe=59, data=0, model=0): 236, ProcessCoord(pipe=59, data=1, model=0): 237, ProcessCoord(pipe=59, data=2, model=0): 238, ProcessCoord(pipe=59, data=3, model=0): 239, ProcessCoord(pipe=60, data=0, model=0): 240, ProcessCoord(pipe=60, data=1, model=0): 241, ProcessCoord(pipe=60, data=2, model=0): 242, ProcessCoord(pipe=60, data=3, model=0): 243, ProcessCoord(pipe=61, data=0, model=0): 244, ProcessCoord(pipe=61, data=1, model=0): 245, ProcessCoord(pipe=61, data=2, model=0): 246, ProcessCoord(pipe=61, data=3, model=0): 247, ProcessCoord(pipe=62, data=0, model=0): 248, ProcessCoord(pipe=62, data=1, model=0): 249, ProcessCoord(pipe=62, data=2, model=0): 250, ProcessCoord(pipe=62, data=3, model=0): 251, ProcessCoord(pipe=63, data=0, model=0): 252, ProcessCoord(pipe=63, data=1, model=0): 253, ProcessCoord(pipe=63, data=2, model=0): 254, ProcessCoord(pipe=63, data=3, model=0): 255, ProcessCoord(pipe=64, data=0, model=0): 256, ProcessCoord(pipe=64, data=1, model=0): 257, ProcessCoord(pipe=64, data=2, model=0): 258, ProcessCoord(pipe=64, data=3, model=0): 259, ProcessCoord(pipe=65, data=0, model=0): 260, ProcessCoord(pipe=65, data=1, model=0): 261, ProcessCoord(pipe=65, data=2, model=0): 262, ProcessCoord(pipe=65, data=3, model=0): 263, ProcessCoord(pipe=66, data=0, model=0): 264, ProcessCoord(pipe=66, data=1, model=0): 265, ProcessCoord(pipe=66, data=2, model=0): 266, ProcessCoord(pipe=66, data=3, model=0): 267, ProcessCoord(pipe=67, data=0, model=0): 268, ProcessCoord(pipe=67, data=1, model=0): 269, ProcessCoord(pipe=67, data=2, model=0): 270, ProcessCoord(pipe=67, data=3, model=0): 271, ProcessCoord(pipe=68, data=0, model=0): 272, ProcessCoord(pipe=68, data=1, model=0): 273, ProcessCoord(pipe=68, data=2, model=0): 274, ProcessCoord(pipe=68, data=3, model=0): 275, ProcessCoord(pipe=69, data=0, model=0): 276, ProcessCoord(pipe=69, data=1, model=0): 277, ProcessCoord(pipe=69, data=2, model=0): 278, ProcessCoord(pipe=69, data=3, model=0): 279, ProcessCoord(pipe=70, data=0, model=0): 280, ProcessCoord(pipe=70, data=1, model=0): 281, ProcessCoord(pipe=70, data=2, model=0): 282, ProcessCoord(pipe=70, data=3, model=0): 283, ProcessCoord(pipe=71, data=0, model=0): 284, ProcessCoord(pipe=71, data=1, model=0): 285, ProcessCoord(pipe=71, data=2, model=0): 286, ProcessCoord(pipe=71, data=3, model=0): 287} [default0]:[2022-09-03 18:56:22,594] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer|embedding [default0]:stage=0 layers=3 [default0]: 0: _to_float16 [default0]: 1: EmbeddingPipe [default0]: 2: [default0]:stage=1 layers=1 [default0]: 3: ParallelTransformerLayerPipe [default0]:stage=2 layers=1 [default0]: 4: ParallelTransformerLayerPipe [default0]:stage=3 layers=1 [default0]: 5: ParallelTransformerLayerPipe [default0]:stage=4 layers=1 [default0]: 6: ParallelTransformerLayerPipe [default0]:stage=5 layers=1 [default0]: 7: ParallelTransformerLayerPipe [default0]:stage=6 layers=1 [default0]: 8: ParallelTransformerLayerPipe [default0]:stage=7 layers=1 [default0]: 9: ParallelTransformerLayerPipe [default0]:stage=8 layers=1 [default0]: 10: ParallelTransformerLayerPipe [default0]:stage=9 layers=1 [default0]: 11: ParallelTransformerLayerPipe [default0]:stage=10 layers=1 [default0]: 12: ParallelTransformerLayerPipe [default0]:stage=11 layers=1 [default0]: 13: ParallelTransformerLayerPipe [default0]:stage=12 layers=1 [default0]: 14: ParallelTransformerLayerPipe [default0]:stage=13 layers=1 [default0]: 15: ParallelTransformerLayerPipe [default0]:stage=14 layers=1 [default0]: 16: ParallelTransformerLayerPipe [default0]:stage=15 layers=1 [default0]: 17: ParallelTransformerLayerPipe [default0]:stage=16 layers=1 [default0]: 18: ParallelTransformerLayerPipe [default0]:stage=17 layers=1 [default0]: 19: ParallelTransformerLayerPipe [default0]:stage=18 layers=1 [default0]: 20: ParallelTransformerLayerPipe [default0]:stage=19 layers=1 [default0]: 21: ParallelTransformerLayerPipe [default0]:stage=20 layers=1 [default0]: 22: ParallelTransformerLayerPipe [default0]:stage=21 layers=1 [default0]: 23: ParallelTransformerLayerPipe [default0]:stage=22 layers=1 [default0]: 24: ParallelTransformerLayerPipe [default0]:stage=23 layers=1 [default0]: 25: ParallelTransformerLayerPipe [default0]:stage=24 layers=1 [default0]: 26: ParallelTransformerLayerPipe [default0]:stage=25 layers=1 [default0]: 27: ParallelTransformerLayerPipe [default0]:stage=26 layers=1 [default0]: 28: ParallelTransformerLayerPipe [default0]:stage=27 layers=1 [default0]: 29: ParallelTransformerLayerPipe [default0]:stage=28 layers=1 [default0]: 30: ParallelTransformerLayerPipe [default0]:stage=29 layers=1 [default0]: 31: ParallelTransformerLayerPipe [default0]:stage=30 layers=1 [default0]: 32: ParallelTransformerLayerPipe [default0]:stage=31 layers=1 [default0]: 33: ParallelTransformerLayerPipe [default0]:stage=32 layers=1 [default0]: 34: ParallelTransformerLayerPipe [default0]:stage=33 layers=1 [default0]: 35: ParallelTransformerLayerPipe [default0]:stage=34 layers=1 [default0]: 36: ParallelTransformerLayerPipe [default0]:stage=35 layers=1 [default0]: 37: ParallelTransformerLayerPipe [default0]:stage=36 layers=1 [default0]: 38: ParallelTransformerLayerPipe [default0]:stage=37 layers=1 [default0]: 39: ParallelTransformerLayerPipe [default0]:stage=38 layers=1 [default0]: 40: ParallelTransformerLayerPipe [default0]:stage=39 layers=1 [default0]: 41: ParallelTransformerLayerPipe [default0]:stage=40 layers=1 [default0]: 42: ParallelTransformerLayerPipe [default0]:stage=41 layers=1 [default0]: 43: ParallelTransformerLayerPipe [default0]:stage=42 layers=1 [default0]: 44: ParallelTransformerLayerPipe [default0]:stage=43 layers=1 [default0]: 45: ParallelTransformerLayerPipe [default0]:stage=44 layers=1 [default0]: 46: ParallelTransformerLayerPipe [default0]:stage=45 layers=1 [default0]: 47: ParallelTransformerLayerPipe [default0]:stage=46 layers=1 [default0]: 48: ParallelTransformerLayerPipe [default0]:stage=47 layers=1 [default0]: 49: ParallelTransformerLayerPipe [default0]:stage=48 layers=1 [default0]: 50: ParallelTransformerLayerPipe [default0]:stage=49 layers=1 [default0]: 51: ParallelTransformerLayerPipe [default0]:stage=50 layers=1 [default0]: 52: ParallelTransformerLayerPipe [default0]:stage=51 layers=1 [default0]: 53: ParallelTransformerLayerPipe [default0]:stage=52 layers=1 [default0]: 54: ParallelTransformerLayerPipe [default0]:stage=53 layers=1 [default0]: 55: ParallelTransformerLayerPipe [default0]:stage=54 layers=1 [default0]: 56: ParallelTransformerLayerPipe [default0]:stage=55 layers=1 [default0]: 57: ParallelTransformerLayerPipe [default0]:stage=56 layers=1 [default0]: 58: ParallelTransformerLayerPipe [default0]:stage=57 layers=1 [default0]: 59: ParallelTransformerLayerPipe [default0]:stage=58 layers=1 [default0]: 60: ParallelTransformerLayerPipe [default0]:stage=59 layers=1 [default0]: 61: ParallelTransformerLayerPipe [default0]:stage=60 layers=1 [default0]: 62: ParallelTransformerLayerPipe [default0]:stage=61 layers=1 [default0]: 63: ParallelTransformerLayerPipe [default0]:stage=62 layers=1 [default0]: 64: ParallelTransformerLayerPipe [default0]:stage=63 layers=1 [default0]: 65: ParallelTransformerLayerPipe [default0]:stage=64 layers=1 [default0]: 66: ParallelTransformerLayerPipe [default0]:stage=65 layers=1 [default0]: 67: ParallelTransformerLayerPipe [default0]:stage=66 layers=1 [default0]: 68: ParallelTransformerLayerPipe [default0]:stage=67 layers=1 [default0]: 69: ParallelTransformerLayerPipe [default0]:stage=68 layers=1 [default0]: 70: ParallelTransformerLayerPipe [default0]:stage=69 layers=1 [default0]: 71: ParallelTransformerLayerPipe [default0]:stage=70 layers=3 [default0]: 72: ParallelTransformerLayerPipe [default0]: 73: undo [default0]: 74: MixedFusedLayerNorm [default0]:stage=71 layers=2 [default0]: 75: EmbeddingPipe [default0]: 76: float16_to_fp32 [default0]: loss: CrossEntropy [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default1]:Building extension module utils... [default1]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default1]:ninja: no work to do. [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3512558937072754 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3512592315673828 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.35133790969848633 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.39713191986083984 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3960716724395752 seconds [default5]:Loading extension module utils... [default6]:Loading extension module utils... [default5]:Time to load utils op: 0.3960230350494385 seconds [default6]:Time to load utils op: 0.39569997787475586 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.35126256942749023 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.13150644302368164 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.13130712509155273 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.13112926483154297 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.12450981140136719 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.12441682815551758 seconds [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.12466001510620117 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.12462115287780762 seconds [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.13142085075378418 seconds [default0]:[2022-09-03 18:56:24,330] [INFO] [utils.py:827:see_memory_usage] After Building Model [default0]:[2022-09-03 18:56:24,330] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 18:56:24,330] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.36 GB, percent = 7.2% [default0]:setting training iterations to 3100 [default0]:> learning rate decay style: constant [default0]:DeepSpeed is enabled. [default0]:[2022-09-03 18:56:24,331] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.1+8b2a6371, git-hash=8b2a6371, git-branch=master [default2]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default2]:Building extension module utils... [default2]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default2]:ninja: no work to do. [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.4258749485015869 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3746330738067627 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3737320899963379 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3668954372406006 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3672611713409424 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.36745691299438477 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.36729907989501953 seconds [default1]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3782918453216553 seconds [default1]:Time to load utils op: 0.3784933090209961 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3842747211456299 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.37369728088378906 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.39353132247924805 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3938324451446533 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3936753273010254 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.39385032653808594 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3738231658935547 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3886232376098633 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3837459087371826 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3684053421020508 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.48447084426879883 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.48540186882019043 seconds [default0]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.4848141670227051 seconds [default0]:Time to load utils op: 0.4850742816925049 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.4193716049194336 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.36821579933166504 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.41933417320251465 seconds [default1]:Loading extension module utils... [default2]:Loading extension module utils... [default1]:Time to load utils op: 0.3875730037689209 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.36777472496032715 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3675682544708252 seconds [default2]:Time to load utils op: 0.386244535446167 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.5004410743713379 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.500251054763794 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3747999668121338 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.4299776554107666 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.5027351379394531 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.38724684715270996 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.4200613498687744 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.41943883895874023 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.4291048049926758 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.42881059646606445 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.4065425395965576 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.40607643127441406 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.40619611740112305 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004601478576660156 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.49999284744262695 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.5028476715087891 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.4417994022369385 seconds [default3]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.37926721572875977 seconds [default3]:Time to load utils op: 0.3786468505859375 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3896498680114746 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3895986080169678 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.39033007621765137 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.4423041343688965 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.38938021659851074 seconds [default5]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.4419276714324951 seconds [default5]:Time to load utils op: 0.44231152534484863 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.40637636184692383 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.437542200088501 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.364825963973999 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3840620517730713 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.4786109924316406 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3750617504119873 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3744626045227051 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3876769542694092 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.39391183853149414 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.39355015754699707 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3886268138885498 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3894360065460205 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.503079891204834 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.37560510635375977 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3876760005950928 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.38379430770874023 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.4375772476196289 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3422844409942627 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3935518264770508 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3939366340637207 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.4375779628753662 seconds [default4]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.5027258396148682 seconds [default4]:Time to load utils op: 0.43613553047180176 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.42894697189331055 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3585481643676758 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3960099220275879 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3832974433898926 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.38338518142700195 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3938765525817871 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.43773865699768066 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3585038185119629 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3645668029785156 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.35775160789489746 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.36370229721069336 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3947479724884033 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.38358545303344727 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.36408495903015137 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3960106372833252 seconds [default1]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3955509662628174 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3847806453704834 seconds [default1]:Time to load utils op: 0.38407421112060547 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.36449742317199707 seconds [default4]:Loading extension module utils... [default6]:Loading extension module utils... [default4]:Time to load utils op: 0.36449670791625977 seconds [default6]:Time to load utils op: 0.3640115261077881 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.35800671577453613 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.43552350997924805 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.40088438987731934 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.4355335235595703 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.4914388656616211 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.43544960021972656 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.4206690788269043 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.4917895793914795 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.4005763530731201 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.4276893138885498 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.39422607421875 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.39464712142944336 seconds [default1]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.43450355529785156 seconds [default1]:Time to load utils op: 0.4346649646759033 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.39465785026550293 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.43698859214782715 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.39348912239074707 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.40026283264160156 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.4277186393737793 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.37492895126342773 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3416256904602051 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3956031799316406 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.4277374744415283 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.4205012321472168 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.4370691776275635 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3835463523864746 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.34116363525390625 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.4371016025543213 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3412764072418213 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.40031003952026367 seconds [default2]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.37410664558410645 seconds [default2]:Time to load utils op: 0.3743455410003662 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3741586208343506 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.35766100883483887 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.36394214630126953 seconds [default0]:Loading extension module utils... [default6]:Loading extension module utils... [default0]:Time to load utils op: 0.37487053871154785 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3741328716278076 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3740825653076172 seconds [default6]:Time to load utils op: 0.38445019721984863 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.43439245223999023 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.4344358444213867 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.375185489654541 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.37134623527526855 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3851494789123535 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.35785913467407227 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3733386993408203 seconds [default3]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.35794687271118164 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.38439297676086426 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.38469886779785156 seconds [default3]:Time to load utils op: 0.37372922897338867 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.4212160110473633 seconds [default2]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.4163365364074707 seconds [default2]:Time to load utils op: 0.41600465774536133 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.42051148414611816 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.49097108840942383 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.49135780334472656 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.358090877532959 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.42768311500549316 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.47680187225341797 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.4129929542541504 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.41222095489501953 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.37823987007141113 seconds [default4]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.41210150718688965 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.4764888286590576 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.4765598773956299 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.37079691886901855 seconds [default4]:Time to load utils op: 0.4766206741333008 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.37102627754211426 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.37736940383911133 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.35926008224487305 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.437061071395874 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3581986427307129 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.35796332359313965 seconds [default2]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.35742902755737305 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.35963010787963867 seconds [default5]:Loading extension module utils... [default2]:Time to load utils op: 0.3576631546020508 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.38773632049560547 seconds [default5]:Time to load utils op: 0.35953807830810547 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.4134964942932129 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.36992835998535156 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.35907816886901855 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.37685728073120117 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3860640525817871 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3766207695007324 seconds [default4]:Loading extension module utils... [default5]:Loading extension module utils... [default4]:Time to load utils op: 0.38794779777526855 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.38605213165283203 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3865065574645996 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.38769006729125977 seconds [default5]:Time to load utils op: 0.44396448135375977 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.44514012336730957 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.38770341873168945 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3876922130584717 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3582484722137451 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3560805320739746 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.412808895111084 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3575150966644287 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.35799193382263184 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.44520092010498047 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3578197956085205 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.41743016242980957 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.4786674976348877 seconds [default7]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3745436668395996 seconds [default7]:Time to load utils op: 0.3739206790924072 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.4128682613372803 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.4127488136291504 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.41620922088623047 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.48593735694885254 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.4859001636505127 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.35253310203552246 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.438265323638916 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.43848204612731934 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.35685253143310547 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.44480371475219727 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.41269755363464355 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.36115288734436035 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3783266544342041 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.36156272888183594 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.35803747177124023 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.35173892974853516 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.37065744400024414 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.35221290588378906 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3750941753387451 seconds [default2]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3699777126312256 seconds [default2]:Time to load utils op: 0.3522679805755615 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.4497823715209961 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.35813355445861816 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.4142155647277832 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.4852421283721924 seconds [default7]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.4385850429534912 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.48528265953063965 seconds [default7]:Time to load utils op: 0.4384746551513672 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.37474989891052246 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3764626979827881 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3750917911529541 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3751218318939209 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.37746191024780273 seconds [default7]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.375058650970459 seconds [default7]:Time to load utils op: 0.3744363784790039 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.41410160064697266 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.36014819145202637 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3696136474609375 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3697776794433594 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.37473344802856445 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3765120506286621 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.37642335891723633 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3569481372833252 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3764536380767822 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3567831516265869 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.35813236236572266 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3589186668395996 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3925340175628662 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3751099109649658 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.4140756130218506 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.449751615524292 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.44978761672973633 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.39224767684936523 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.44978809356689453 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3823728561401367 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3811054229736328 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3826179504394531 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.4097888469696045 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.39211249351501465 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3770301342010498 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3931922912597656 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.38194847106933594 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3603835105895996 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.4142634868621826 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3783295154571533 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.47859907150268555 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.47870397567749023 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.41023874282836914 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.4104902744293213 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.40993571281433105 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005733966827392578 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005674362182617188 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005161762237548828 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006039142608642578 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005300045013427734 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005390644073486328 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default2]:Time to load utils op: 0.0007028579711914062 seconds [default1]:Time to load utils op: 0.0005350112915039062 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006597042083740234 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005784034729003906 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005409717559814453 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006642341613769531 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007011890411376953 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006318092346191406 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006914138793945312 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006070137023925781 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007114410400390625 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007388591766357422 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007426738739013672 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005412101745605469 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.000637054443359375 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006396770477294922 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008366107940673828 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005729198455810547 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0009393692016601562 seconds [default1]:Time to load utils op: 0.0005714893341064453 seconds [default5]:Time to load utils op: 0.0006067752838134766 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005054473876953125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005390644073486328 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005574226379394531 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.000530242919921875 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006434917449951172 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005648136138916016 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007321834564208984 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006647109985351562 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006914138793945312 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.000579833984375 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005352497100830078 seconds [default7]:Time to load utils op: 0.0005726814270019531 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00036144256591796875 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006887912750244141 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005719661712646484 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004951953887939453 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00038743019104003906 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008590221405029297 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007462501525878906 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006909370422363281 seconds [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005886554718017578 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004851818084716797 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00061798095703125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006108283996582031 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007507801055908203 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.000713348388671875 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006809234619140625 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006089210510253906 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007164478302001953 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000579833984375 seconds [default5]:Time to load utils op: 0.0005438327789306641 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005357265472412109 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0008993148803710938 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005326271057128906 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005505084991455078 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006067752838134766 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005159378051757812 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0012145042419433594 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0011858940124511719 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005016326904296875 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006616115570068359 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007412433624267578 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006070137023925781 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.000598907470703125 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006492137908935547 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.000537872314453125 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005731582641601562 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0008115768432617188 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005109310150146484 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00048804283142089844 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0031003952026367188 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006830692291259766 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000835418701171875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005817413330078125 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007393360137939453 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006012916564941406 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005590915679931641 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006890296936035156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005080699920654297 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008516311645507812 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006389617919921875 seconds [default4]:Time to load utils op: 0.0006110668182373047 seconds [default2]:Time to load utils op: 0.0005686283111572266 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007665157318115234 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.000934600830078125 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005497932434082031 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008590221405029297 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006518363952636719 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006735324859619141 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005857944488525391 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.000782012939453125 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007097721099853516 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004744529724121094 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008931159973144531 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006563663482666016 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008056163787841797 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008716583251953125 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006747245788574219 seconds [default0]:Time to load utils op: 0.0005831718444824219 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004210472106933594 seconds [default7]:Time to load utils op: 0.00046324729919433594 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006189346313476562 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008573532104492188 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Time to load utils op: 0.0005040168762207031 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006227493286132812 seconds [default0]:Loading extension module utils... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006432533264160156 seconds [default0]:Time to load utils op: 0.0006849765777587891 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007412433624267578 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009768009185791016 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007088184356689453 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005626678466796875 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008671283721923828 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006413459777832031 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008103847503662109 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005316734313964844 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00072479248046875 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006225109100341797 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007076263427734375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007479190826416016 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006639957427978516 seconds [default6]:Time to load utils op: 0.0005822181701660156 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Time to load utils op: 0.0005228519439697266 seconds [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006325244903564453 seconds [default1]:Time to load utils op: 0.0004761219024658203 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008211135864257812 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005831718444824219 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007028579711914062 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00069427490234375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005125999450683594 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006656646728515625 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006859302520751953 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005466938018798828 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default1]:Loading extension module utils... [default6]:Time to load utils op: 0.0005724430084228516 seconds [default1]:Time to load utils op: 0.0006020069122314453 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005340576171875 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007588863372802734 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007996559143066406 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005602836608886719 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006811618804931641 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005877017974853516 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005996227264404297 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004978179931640625 seconds [default2]:Time to load utils op: 0.0003821849822998047 seconds [default6]:Time to load utils op: 0.0005671977996826172 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007269382476806641 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006463527679443359 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006880760192871094 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005970001220703125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007224082946777344 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005471706390380859 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007486343383789062 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005662441253662109 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007181167602539062 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005202293395996094 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005753040313720703 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006251335144042969 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006940364837646484 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005452632904052734 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005311965942382812 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004954338073730469 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007007122039794922 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006535053253173828 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005540847778320312 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007348060607910156 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00048065185546875 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005588531494140625 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Time to load utils op: 0.0006175041198730469 seconds [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008301734924316406 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009298324584960938 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008020401000976562 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0012001991271972656 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005476474761962891 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007143020629882812 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004904270172119141 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008966922760009766 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006954669952392578 seconds [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008361339569091797 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006191730499267578 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default5]:Time to load utils op: 0.0009226799011230469 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006718635559082031 seconds [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Time to load utils op: 0.0007085800170898438 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004992485046386719 seconds [default6]:Time to load utils op: 0.0005869865417480469 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008482933044433594 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004696846008300781 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005483627319335938 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.000865936279296875 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005271434783935547 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006594657897949219 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006861686706542969 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007262229919433594 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007984638214111328 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007395744323730469 seconds [default7]:Time to load utils op: 0.0007200241088867188 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006721019744873047 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005373954772949219 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004949569702148438 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004875659942626953 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005054473876953125 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000751495361328125 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005733966827392578 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007119178771972656 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.001008749008178711 seconds [default5]:Time to load utils op: 0.0005388259887695312 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.000545501708984375 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008273124694824219 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007023811340332031 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00060272216796875 seconds [default4]:Time to load utils op: 0.0009183883666992188 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008335113525390625 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006401538848876953 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005738735198974609 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Time to load utils op: 0.0005924701690673828 seconds [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006248950958251953 seconds [default6]:Time to load utils op: 0.0006229877471923828 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0011293888092041016 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007519721984863281 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008170604705810547 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006053447723388672 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007872581481933594 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004665851593017578 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00049591064453125 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00058746337890625 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006086826324462891 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007479190826416016 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006589889526367188 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007236003875732422 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005822181701660156 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006120204925537109 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0008301734924316406 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008518695831298828 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006160736083984375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006241798400878906 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005640983581542969 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006420612335205078 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006494522094726562 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006463527679443359 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Time to load utils op: 0.0006654262542724609 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005159378051757812 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006258487701416016 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007922649383544922 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0011944770812988281 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006005764007568359 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008566379547119141 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.000997781753540039 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0010869503021240234 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008740425109863281 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0008499622344970703 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006685256958007812 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005135536193847656 seconds [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.000762939453125 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006966590881347656 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.000881195068359375 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005221366882324219 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0010063648223876953 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005855560302734375 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008730888366699219 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005564689636230469 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006017684936523438 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007998943328857422 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005788803100585938 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008018016815185547 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.000675201416015625 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0009539127349853516 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-03 18:56:25,056] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [default0]:[2022-09-03 18:56:25,056] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer [default0]:[2022-09-03 18:56:25,056] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer [default0]:[2022-09-03 18:56:25,056] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = {basic_optimizer.__class__.__name__} [default0]:[2022-09-03 18:56:25,056] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-03 18:56:25,101] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer [default0]:[2022-09-03 18:56:25,101] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 18:56:25,101] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.52 GB, percent = 7.3% [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default4]:Building extension module utils... [default4]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20726251602172852 seconds [default4]:ninja: no work to do. [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21564793586730957 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20627069473266602 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20687341690063477 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2024846076965332 seconds [default0]:[2022-09-03 18:56:25,334] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 [default0]:[2022-09-03 18:56:25,335] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 18:56:25,335] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.52 GB, percent = 7.3% [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3043515682220459 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3036971092224121 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3039400577545166 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00046753883361816406 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0015308856964111328 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0016124248504638672 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0014922618865966797 seconds [default0]:[2022-09-03 18:56:25,400] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 [default0]:[2022-09-03 18:56:25,400] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 18:56:25,400] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.52 GB, percent = 7.3% [default0]:[2022-09-03 18:56:25,427] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 [default0]:[2022-09-03 18:56:25,427] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 18:56:25,427] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.52 GB, percent = 7.3% [default0]:[2022-09-03 18:56:25,455] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 [default0]:[2022-09-03 18:56:25,455] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 18:56:25,455] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.52 GB, percent = 7.3% [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004088878631591797 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0003421306610107422 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003733634948730469 seconds [default0]:[2022-09-03 18:56:25,482] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer [default0]:[2022-09-03 18:56:25,483] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 18:56:25,483] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.52 GB, percent = 7.3% [default0]:[2022-09-03 18:56:25,555] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer [default0]:[2022-09-03 18:56:25,556] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-03 18:56:25,556] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.52 GB, percent = 7.3% [default0]:[2022-09-03 18:56:25,582] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer [default0]:[2022-09-03 18:56:25,583] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-03 18:56:25,583] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.52 GB, percent = 7.3% [default0]:[2022-09-03 18:56:25,583] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [default0]:[2022-09-03 18:56:25,583] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler [default0]:[2022-09-03 18:56:25,583] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [default0]:[2022-09-03 18:56:25,583] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05, 2e-05], mom=[(0.9, 0.95), (0.9, 0.95)] [default0]:[2022-09-03 18:56:25,583] [INFO] [config.py:987:print] DeepSpeedEngine configuration: [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] activation_checkpointing_config { [default0]: "partition_activations": false, [default0]: "contiguous_memory_optimization": false, [default0]: "cpu_checkpointing": false, [default0]: "number_checkpoints": null, [default0]: "synchronize_checkpoint_boundary": false, [default0]: "profile": false [default0]:} [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] amp_enabled .................. False [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] amp_params ................... False [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] autotuning_config ............ { [default0]: "enabled": false, [default0]: "start_step": null, [default0]: "end_step": null, [default0]: "metric_path": null, [default0]: "arg_mappings": null, [default0]: "metric": "throughput", [default0]: "model_info": null, [default0]: "results_dir": null, [default0]: "exps_dir": null, [default0]: "overwrite": true, [default0]: "fast": true, [default0]: "start_profile_step": 3, [default0]: "end_profile_step": 5, [default0]: "tuner_type": "gridsearch", [default0]: "tuner_early_stopping": 5, [default0]: "tuner_num_trials": 50, [default0]: "model_info_path": null, [default0]: "mp_size": 1, [default0]: "max_train_batch_size": null, [default0]: "min_train_batch_size": 1, [default0]: "max_train_micro_batch_size_per_gpu": 1.024000e+03, [default0]: "min_train_micro_batch_size_per_gpu": 1, [default0]: "num_tuning_micro_batch_sizes": 3 [default0]:} [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] bfloat16_enabled ............. True [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] checkpoint_tag_validation_enabled True [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] checkpoint_tag_validation_fail False [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] comms_config ................. [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] communication_data_type ...... None [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] curriculum_enabled ........... False [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] curriculum_params ............ False [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] dataloader_drop_last ......... False [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] disable_allgather ............ False [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] dump_state ................... False [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] dynamic_loss_scale_args ...... None [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] eigenvalue_enabled ........... False [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] eigenvalue_gas_boundary_resolution 1 [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] eigenvalue_layer_name ........ bert.encoder.layer [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] eigenvalue_layer_num ......... 0 [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] eigenvalue_max_iter .......... 100 [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] eigenvalue_stability ......... 1e-06 [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] eigenvalue_tol ............... 0.01 [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] eigenvalue_verbose ........... False [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] elasticity_enabled ........... False [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] flops_profiler_config ........ { [default0]: "enabled": false, [default0]: "profile_step": 1, [default0]: "module_depth": -1, [default0]: "top_modules": 1, [default0]: "detailed": true, [default0]: "output_file": null [default0]:} [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] fp16_auto_cast ............... None [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] fp16_enabled ................. False [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] fp16_master_weights_and_gradients False [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] global_rank .................. 0 [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] gradient_accumulation_steps .. 512 [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] gradient_clipping ............ 1.0 [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] gradient_predivide_factor .... 1.0 [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] initial_dynamic_scale ........ 1 [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] load_universal_checkpoint .... True [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] loss_scale ................... 1.0 [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] memory_breakdown ............. False [default0]:[2022-09-03 18:56:25,584] [INFO] [config.py:991:print] monitor_config ............... [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] nebula_config ................ { [default0]: "enabled": false, [default0]: "persistent_storage_path": null, [default0]: "persistent_time_interval": 100, [default0]: "num_of_version_in_retention": 2, [default0]: "enable_nebula_load": true, [default0]: "load_path": null [default0]:} [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] optimizer_legacy_fusion ...... False [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] optimizer_name ............... None [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] optimizer_params ............. None [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] pld_enabled .................. False [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] pld_params ................... False [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] prescale_gradients ........... False [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] scheduler_name ............... None [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] scheduler_params ............. None [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] sparse_attention ............. None [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] sparse_gradients_enabled ..... False [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] steps_per_print .............. 2000 [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] train_batch_size ............. 2048 [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] train_micro_batch_size_per_gpu 1 [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] wall_clock_breakdown ......... False [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] world_size ................... 4 [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] zero_allow_untested_optimizer False [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] zero_enabled ................. False [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:991:print] zero_optimization_stage ...... 0 [default0]:[2022-09-03 18:56:25,585] [INFO] [config.py:976:print_user_config] json = { [default0]: "train_micro_batch_size_per_gpu": 1, [default0]: "train_batch_size": 2.048000e+03, [default0]: "gradient_clipping": 1.0, [default0]: "zero_optimization": { [default0]: "stage": 0 [default0]: }, [default0]: "bf16": { [default0]: "enabled": true [default0]: }, [default0]: "steps_per_print": 2.000000e+03, [default0]: "wall_clock_breakdown": false, [default0]: "checkpoint": { [default0]: "load_universal": true [default0]: } [default0]:} [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004544258117675781 seconds [default0]:[2022-09-03 18:56:25,586] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=512 micro_batch_size=1 [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=108 STAGE=27 LAYERS=1 [29, 30) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=172 STAGE=43 LAYERS=1 [45, 46) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=188 STAGE=47 LAYERS=1 [49, 50) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=184 STAGE=46 LAYERS=1 [48, 49) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=152 STAGE=38 LAYERS=1 [40, 41) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,182] [INFO] [engine.py:145:__init__] RANK=4 STAGE=1 LAYERS=1 [3, 4) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=156 STAGE=39 LAYERS=1 [41, 42) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=24 STAGE=6 LAYERS=1 [8, 9) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=208 STAGE=52 LAYERS=1 [54, 55) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=112 STAGE=28 LAYERS=1 [30, 31) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=28 STAGE=7 LAYERS=1 [9, 10) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=88 STAGE=22 LAYERS=1 [24, 25) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=212 STAGE=53 LAYERS=1 [55, 56) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=120 STAGE=30 LAYERS=1 [32, 33) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,182] [INFO] [engine.py:145:__init__] RANK=128 STAGE=32 LAYERS=1 [34, 35) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,182] [INFO] [engine.py:145:__init__] RANK=132 STAGE=33 LAYERS=1 [35, 36) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=124 STAGE=31 LAYERS=1 [33, 34) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=92 STAGE=23 LAYERS=1 [25, 26) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=216 STAGE=54 LAYERS=1 [56, 57) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=8 STAGE=2 LAYERS=1 [4, 5) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=248 STAGE=62 LAYERS=1 [64, 65) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=252 STAGE=63 LAYERS=1 [65, 66) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=140 STAGE=35 LAYERS=1 [37, 38) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=32 STAGE=8 LAYERS=1 [10, 11) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,182] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=3 [0, 3) STAGE_PARAMS=3596644352 (3596.644M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=12 STAGE=3 LAYERS=1 [5, 6) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=136 STAGE=34 LAYERS=1 [36, 37) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=96 STAGE=24 LAYERS=1 [26, 27) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,182] [INFO] [engine.py:145:__init__] RANK=264 STAGE=66 LAYERS=1 [68, 69) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,182] [INFO] [engine.py:145:__init__] RANK=268 STAGE=67 LAYERS=1 [69, 70) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=220 STAGE=55 LAYERS=1 [57, 58) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=48 STAGE=12 LAYERS=1 [14, 15) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=52 STAGE=13 LAYERS=1 [15, 16) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=116 STAGE=29 LAYERS=1 [31, 32) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=200 STAGE=50 LAYERS=1 [52, 53) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=36 STAGE=9 LAYERS=1 [11, 12) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=192 STAGE=48 LAYERS=1 [50, 51) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=64 STAGE=16 LAYERS=1 [18, 19) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,182] [INFO] [engine.py:145:__init__] RANK=256 STAGE=64 LAYERS=1 [66, 67) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=204 STAGE=51 LAYERS=1 [53, 54) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=224 STAGE=56 LAYERS=1 [58, 59) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,182] [INFO] [engine.py:145:__init__] RANK=196 STAGE=49 LAYERS=1 [51, 52) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=100 STAGE=25 LAYERS=1 [27, 28) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,182] [INFO] [engine.py:145:__init__] RANK=260 STAGE=65 LAYERS=1 [67, 68) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,182] [INFO] [engine.py:145:__init__] RANK=272 STAGE=68 LAYERS=1 [70, 71) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=176 STAGE=44 LAYERS=1 [46, 47) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=180 STAGE=45 LAYERS=1 [47, 48) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,182] [INFO] [engine.py:145:__init__] RANK=280 STAGE=70 LAYERS=3 [72, 75) STAGE_PARAMS=2466465792 (2466.466M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,182] [INFO] [engine.py:145:__init__] RANK=68 STAGE=17 LAYERS=1 [19, 20) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=144 STAGE=36 LAYERS=1 [38, 39) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=148 STAGE=37 LAYERS=1 [39, 40) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,182] [INFO] [engine.py:145:__init__] RANK=276 STAGE=69 LAYERS=1 [71, 72) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=72 STAGE=18 LAYERS=1 [20, 21) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=76 STAGE=19 LAYERS=1 [21, 22) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=240 STAGE=60 LAYERS=1 [62, 63) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=16 STAGE=4 LAYERS=1 [6, 7) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=228 STAGE=57 LAYERS=1 [59, 60) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=20 STAGE=5 LAYERS=1 [7, 8) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=232 STAGE=58 LAYERS=1 [60, 61) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=80 STAGE=20 LAYERS=1 [22, 23) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=284 STAGE=71 LAYERS=2 [75, 77) STAGE_PARAMS=3596615680 (3596.616M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=84 STAGE=21 LAYERS=1 [23, 24) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=56 STAGE=14 LAYERS=1 [16, 17) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=244 STAGE=61 LAYERS=1 [63, 64) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=44 STAGE=11 LAYERS=1 [13, 14) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=60 STAGE=15 LAYERS=1 [17, 18) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=160 STAGE=40 LAYERS=1 [42, 43) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=168 STAGE=42 LAYERS=1 [44, 45) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=104 STAGE=26 LAYERS=1 [28, 29) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,182] [INFO] [engine.py:145:__init__] RANK=164 STAGE=41 LAYERS=1 [43, 44) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=236 STAGE=59 LAYERS=1 [61, 62) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:56:26,183] [INFO] [engine.py:145:__init__] RANK=40 STAGE=10 LAYERS=1 [12, 13) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) srun: Job step aborted: Waiting up to 62 seconds for job step to finish. WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3954296 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1890851 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3020122 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3631574 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2931162 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1890852 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3954297 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2016829 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1960653 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3631575 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2931163 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3020123 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1960654 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1971422 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2016830 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2637359 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 248418 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1980436 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1890853 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1971423 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 419927 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2637360 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1442768 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3020124 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 248419 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1980437 closing signal SIGTERM slurmstepd: error: *** STEP 927263.0 ON jean-zay-iam02 CANCELLED AT 2022-09-03T18:56:26 *** WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2931164 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3020125 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3592422 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 419928 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1442769 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1960655 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2016831 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3020126 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2134652 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3954298 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3592423 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 512621 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 513499 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3608026 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1971424 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 248420 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3020127 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1890854 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 370298 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2637361 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1980438 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 512622 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3954299 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 513500 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3631576 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3020128 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2134653 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3954300 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2931165 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 370299 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3608027 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1442770 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2227831 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 419929 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3631577 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2016832 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2931166 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3020129 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3954301 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3592424 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3784133 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2227832 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3631578 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2931167 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2016833 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3954302 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 248421 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 513501 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1971425 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 512623 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3631579 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2931168 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3784134 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2016834 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3954303 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1890855 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1551927 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2134654 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1714489 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 370300 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1980439 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3631580 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2931169 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2016835 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1960656 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1890856 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1980440 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1714490 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1551928 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1373716 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3631581 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 419930 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2227833 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2016836 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2670008 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1890857 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1980441 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1800302 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 512624 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 248422 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3152772 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1319361 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3608028 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1971426 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2637362 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1373717 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1980442 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1890858 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3784135 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2670009 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1580171 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 512625 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 370301 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1442771 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2981355 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1800303 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1980443 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3152773 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1971427 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3040275 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 512626 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3592425 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2134655 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 513502 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1714491 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1580172 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1319362 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2637363 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2981356 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408008 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 512627 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1971428 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 927574 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3914236 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 512628 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 248423 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2637364 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3040276 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1971429 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408009 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3608029 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2670010 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1373718 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 248424 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1800304 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2637365 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3152774 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 419931 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3608030 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1551929 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1442772 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3592426 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 513503 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 248425 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2134656 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3914237 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1960657 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2637366 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1319363 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 927575 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1580173 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3608031 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3592427 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1442773 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 513504 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3592428 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3608032 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408010 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1778016 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3784136 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 370302 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 513505 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 927576 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3592429 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1442774 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3608033 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 513506 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1551930 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3152775 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3914238 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1319364 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1778017 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1442775 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1714492 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1551931 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2227834 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2981357 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1319365 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1960658 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3040277 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1551932 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3784137 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1319366 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1551933 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2670011 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 419932 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2134657 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2981358 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1373719 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1551934 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1800305 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3784138 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1319367 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1778018 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3152776 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3784139 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3914239 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1319368 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 419933 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1778019 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2134658 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3152777 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1800306 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 927577 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1714493 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3040278 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3784140 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1580174 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408011 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3152778 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1778020 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1800307 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2227835 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2670012 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 927578 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3040279 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3152779 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1800308 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1778021 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1373720 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408012 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2670013 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3040280 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1800309 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1373721 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 370303 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408013 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1778022 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2670014 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3040281 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1960659 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1373722 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1778023 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408014 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2670015 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1714494 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3040282 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1373723 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408015 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2981359 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1714495 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3914240 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 370304 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1714496 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 419934 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2134659 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2227836 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1960660 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2227837 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2227838 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1580175 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 927579 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3914241 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1580176 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 370305 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 927580 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1580177 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 927581 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1580178 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2981360 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3914242 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2981361 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3914243 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2981362 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3634600 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3634601 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3634602 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3634603 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3634604 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3634605 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3634606 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3634607 closing signal SIGTERM WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** [default7]:> setting tensorboard ... [default0]:using world size: 288, data-parallel-size: 4, tensor-model-parallel size: 1, pipeline-model-parallel size: 72 [default0]:accumulate and all-reduce gradients in fp32 for bfloat16 data type. [default0]:using torch.bfloat16 for parameters ... [default0]:------------------------ arguments ------------------------ [default0]: abort_on_unmet_fused_kernel_constraints ......... True [default0]: accumulate_allreduce_grads_in_fp32 .............. True [default0]: adam_beta1 ...................................... 0.9 [default0]: adam_beta2 ...................................... 0.95 [default0]: adam_eps ........................................ 1e-08 [default0]: adlr_autoresume ................................. False [default0]: adlr_autoresume_interval ........................ 1000 [default0]: apply_query_key_layer_scaling ................... True [default0]: apply_residual_connection_post_layernorm ........ False [default0]: attention_dropout ............................... 0.1 [default0]: attention_softmax_in_fp32 ....................... False [default0]: bert_binary_head ................................ True [default0]: bert_load ....................................... None [default0]: bf16 ............................................ True [default0]: bias_dropout_fusion ............................. True [default0]: bias_gelu_fusion ................................ True [default0]: biencoder_projection_dim ........................ 0 [default0]: biencoder_shared_query_context_model ............ False [default0]: block_data_path ................................. None [default0]: checkpoint_activations .......................... True [default0]: checkpoint_in_cpu ............................... False [default0]: checkpoint_num_layers ........................... 1 [default0]: clip_grad ....................................... 1.0 [default0]: codecarbon_dir .................................. None [default0]: consumed_train_samples .......................... 0 [default0]: consumed_train_tokens ........................... 0 [default0]: consumed_valid_samples .......................... 0 [default0]: contigious_checkpointing ........................ False [default0]: cpu_optimizer ................................... False [default0]: cpu_torch_adam .................................. False [default0]: curriculum_learning ............................. False [default0]: data_impl ....................................... mmap [default0]: data_parallel_size .............................. 4 [default0]: data_path ....................................... None [default0]: dataloader_type ................................. single [default0]: DDP_impl ........................................ local [default0]: decoder_seq_length .............................. None [default0]: deepscale ....................................... False [default0]: deepscale_config ................................ None [default0]: deepspeed ....................................... True [default0]: deepspeed_activation_checkpointing .............. True [default0]: deepspeed_config ................................ ./ds_config.927268.json [default0]: deepspeed_mpi ................................... False [default0]: distribute_checkpointed_activations ............. False [default0]: distributed_backend ............................. nccl [default0]: embed_layernorm ................................. True [default0]: embedding_path .................................. None [default0]: encoder_seq_length .............................. 2048 [default0]: eod_mask_loss ................................... False [default0]: eval_interval ................................... 250 [default0]: eval_iters ...................................... 5 [default0]: eval_only ....................................... None [default0]: evidence_data_path .............................. None [default0]: exit_duration_in_mins ........................... 5990 [default0]: exit_interval ................................... None [default0]: ffn_hidden_size ................................. 57344 [default0]: finetune ........................................ False [default0]: fp16 ............................................ False [default0]: fp16_lm_cross_entropy ........................... False [default0]: fp32_residual_connection ........................ False [default0]: gigaflos_no_embeds .............................. 0 [default0]: global_batch_size ............................... 2048 [default0]: glu_activation .................................. None [default0]: hidden_dropout .................................. 0.1 [default0]: hidden_size ..................................... 14336 [default0]: hysteresis ...................................... 2 [default0]: ict_head_size ................................... None [default0]: ict_load ........................................ None [default0]: img_dim ......................................... 224 [default0]: indexer_batch_size .............................. 128 [default0]: indexer_log_interval ............................ 1000 [default0]: inference ....................................... False [default0]: init_method_std ................................. 0.0048 [default0]: init_method_xavier_uniform ...................... False [default0]: initial_loss_scale .............................. 4294967296 [default0]: kill_switch_path ................................ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/kill-switch-tr13-176B-mtf [default0]: kv_channels ..................................... 128 [default0]: layernorm_epsilon ............................... 1e-05 [default0]: lazy_mpu_init ................................... None [default0]: load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: local_rank ...................................... None [default0]: log_batch_size_to_tensorboard ................... True [default0]: log_interval .................................... 1 [default0]: log_learning_rate_to_tensorboard ................ True [default0]: log_level ....................................... None [default0]: log_level_replica ............................... None [default0]: log_loss_scale_to_tensorboard ................... True [default0]: log_num_zeros_in_grad ........................... False [default0]: log_params_norm ................................. False [default0]: log_path ........................................ None [default0]: log_timers_to_tensorboard ....................... True [default0]: log_validation_ppl_to_tensorboard ............... True [default0]: loss_on_targets_only ............................ False [default0]: loss_scale ...................................... None [default0]: loss_scale_window ............................... 1000 [default0]: lr .............................................. 2e-05 [default0]: lr_decay_iters .................................. None [default0]: lr_decay_samples ................................ None [default0]: lr_decay_style .................................. constant [default0]: lr_decay_tokens ................................. None [default0]: lr_warmup_fraction .............................. None [default0]: lr_warmup_iters ................................. 0 [default0]: lr_warmup_samples ............................... 0 [default0]: make_vocab_size_divisible_by .................... 128 [default0]: mask_prob ....................................... 0.15 [default0]: masked_softmax_fusion ........................... True [default0]: max_position_embeddings ......................... 2048 [default0]: mean_noise_span_length .......................... None [default0]: memory_centric_tiled_linear ..................... False [default0]: merge_file ...................................... None [default0]: micro_batch_size ................................ 1 [default0]: min_loss_scale .................................. 1.0 [default0]: min_lr .......................................... 0.0 [default0]: mmap_warmup ..................................... False [default0]: no_load_optim ................................... True [default0]: no_load_rng ..................................... None [default0]: no_save_optim ................................... None [default0]: no_save_rng ..................................... None [default0]: noise_density ................................... None [default0]: norm_target_loss ................................ True [default0]: num_attention_heads ............................. 112 [default0]: num_channels .................................... 3 [default0]: num_classes ..................................... 1000 [default0]: num_layers ...................................... 70 [default0]: num_layers_per_virtual_pipeline_stage ........... None [default0]: num_workers ..................................... 2 [default0]: onnx_safe ....................................... None [default0]: openai_gelu ..................................... False [default0]: optimizer ....................................... adam [default0]: override_lr_scheduler ........................... False [default0]: pad_vocab_size_to ............................... 250880 [default0]: params_dtype .................................... torch.bfloat16 [default0]: partition_activations ........................... False [default0]: patch_dim ....................................... 16 [default0]: pipeline_model_parallel_size .................... 72 [default0]: position_embedding_type ......................... PositionEmbeddingType.alibi [default0]: pp_partition_method ............................. type:transformer|embedding [default0]: prefixlm ........................................ False [default0]: profile_backward ................................ False [default0]: query_in_block_prob ............................. 0.1 [default0]: rampup_batch_size ............................... None [default0]: rank ............................................ 0 [default0]: remote_device ................................... none [default0]: reset_attention_mask ............................ False [default0]: reset_position_ids .............................. False [default0]: reset_progress .................................. True [default0]: retriever_report_topk_accuracies ................ [] [default0]: retriever_score_scaling ......................... False [default0]: retriever_seq_length ............................ 256 [default0]: reweight_loss_based_on_position_frequency ....... False [default0]: sample_rate ..................................... 1.0 [default0]: save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: save_interval ................................... 5 [default0]: scatter_gather_tensors_in_pipeline .............. True [default0]: scattered_embeddings ............................ False [default0]: seed ............................................ 42 [default0]: seq_length ...................................... 2048 [default0]: sgd_momentum .................................... 0.9 [default0]: short_seq_prob .................................. 0.1 [default0]: skip_train_iteration_range ...................... None [default0]: split ........................................... None [default0]: split_transformers .............................. False [default0]: sync_tp_duplicated_parameters ................... True [default0]: synchronize_each_layer .......................... False [default0]: tensor_model_parallel_size ...................... 1 [default0]: tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/tr13-176B-ml-t0-logs/tensorboard/p31lossseq [default0]: tensorboard_log_interval ........................ 1 [default0]: tensorboard_queue_size .......................... 5 [default0]: test_weighted_split_paths ....................... None [default0]: test_weighted_split_paths_path .................. None [default0]: tile_factor ..................................... 1 [default0]: titles_data_path ................................ None [default0]: tokenizer_name_or_path .......................... bigscience/tokenizer [default0]: tokenizer_type .................................. PretrainedFromHF [default0]: train_iters ..................................... None [default0]: train_samples ................................... 6348800 [default0]: train_tokens .................................... None [default0]: train_weighted_split_names ...................... ['train'] [default0]: train_weighted_split_paths ...................... [['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train']] [default0]: train_weighted_split_paths_path ................. None [default0]: train_weighted_split_splits ..................... [['0:1']] [default0]: train_weighted_split_weights .................... [['1']] [default0]: universal_checkpoint ............................ True [default0]: use_bnb_optimizer ............................... False [default0]: use_checkpoint_lr_scheduler ..................... False [default0]: use_contiguous_buffers_in_ddp ................... True [default0]: use_cpu_initialization .......................... None [default0]: use_one_sent_docs ............................... False [default0]: use_pin_memory .................................. False [default0]: valid_num_workers ............................... 2 [default0]: valid_weighted_split_names ...................... ['validation_pretraining', 'valid_ar', 'valid_ca', 'valid_code', 'valid_en', 'valid_es', 'valid_eu', 'valid_fr', 'valid_id', 'valid_indic-as', 'valid_indic-bn', 'valid_indic-gu', 'valid_indic-hi', 'valid_indic-kn', 'valid_indic-ml', 'valid_indic-mr', 'valid_indic-ne', 'valid_indic-or', 'valid_indic-pa', 'valid_indic-ta', 'valid_indic-te', 'valid_indic-ur', 'valid_nigercongo-all', 'valid_oscar-en', 'valid_oscar-zh', 'valid_pt', 'valid_vi', 'valid_zhs', 'valid_zht', 'valid'] [default0]: valid_weighted_split_paths ...................... [['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document'], ['/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation']] [default0]: valid_weighted_split_paths_path ................. None [default0]: valid_weighted_split_splits ..................... [['0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0:1']] [default0]: valid_weighted_split_weights .................... [['0.0330676168743166', '0.011242051312222764', '0.13027200903379185', '0.22171164529099704', '0.10667815627928671', '0.0015595123898173287', '0.13054018439603915', '0.01091803753667153', '0.00011021422347108609', '0.005492381453597748', '0.0004021215011318779', '0.007470068593492175', '0.0006190467776576425', '0.0010335296343329384', '0.0005012010684646179', '0.0006672772956128299', '0.00035928138344705506', '0.0005084433130291778', '0.0021137328219915496', '0.0009129946225980253', '0.0012454301613725426', '0.00031588689199263235', '0.08137213783015229', '0.055293935695898196', '0.04954150576361177', '0.02461641286531197', '0.12091748245519074', '0.0005177025345001541'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1']] [default0]: virtual_pipeline_model_parallel_size ............ None [default0]: vocab_extra_ids ................................. 0 [default0]: vocab_file ...................................... None [default0]: weight_decay .................................... 0.0001 [default0]: world_size ...................................... 288 [default0]: zero_allgather_bucket_size ...................... 0.0 [default0]: zero_contigious_gradients ....................... False [default0]: zero_reduce_bucket_size ......................... 0.0 [default0]: zero_reduce_scatter ............................. False [default0]: zero_stage ...................................... 0 [default0]:-------------------- end of arguments --------------------- [default0]:setting number of micro-batches to constant 512 [default0]:> building PretrainedFromHF tokenizer ... [default0]: vocab file is un-used. loading tokenizer from pre-trained model [default0]:Offline mode: forcing local_files_only=True [default0]:Offline mode: forcing local_files_only=True [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer.json from cache at /gpfswork/rech/six/commun/models/29d0a41f4527257b8afe6d5495f492dac260318430f18239a42ca5f6dc4487fc.7b0fb8edc2986944ff9b7418149b52d8c4a1354a17d0360deb8974da70c6cc03 [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/added_tokens.json from cache at None [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/special_tokens_map.json from cache at /gpfswork/rech/six/commun/models/4f03e43bcc54e0721823e6a06b1d197905e2ea79aa7dcc1a0f0fcecc73ce3fb2.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer_config.json from cache at /gpfswork/rech/six/commun/models/9441c67b923ef7a65950a64e31c40f80ed181ba59502981a80f2cd0c438c6432.3c09887250243e50d8de9d10b2a778152434f62a22a95b5f89dbbe79a6eb496a [default0]: > padded vocab (size: 250680) with 200 dummy tokens (new size: 250880) [default0]:DeepSpeed general environment info: [default0]:torch install path ............... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch'] [default0]:torch version .................... 1.12.0 [default0]:torch cuda version ............... 11.3 [default0]:torch hip version ................ None [default0]:nvcc version ..................... 11.4 [default0]:deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed'] [default0]:deepspeed info ................... 0.7.1+8b2a6371, 8b2a6371, master [default0]:deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3 [default0]:**** Git info for Megatron: git_hash=6c1018f git_branch=mtf-multival **** [default0]:> initializing torch distributed ... [default0]:[2022-09-03 18:57:38,667] [INFO] [comm.py:628:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [default0]:> initializing tensor model parallel with size 1 [default0]:> initializing pipeline model parallel with size 72 [default0]:> setting random seeds to 42 ... [default0]:[2022-09-03 18:57:48,662] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 [default0]:> compiling dataset index builder ... [default0]:make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:make: Nothing to be done for 'default'. [default0]:make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:>>> done with dataset index builder. Compilation time: 0.090 seconds [default0]:> compiling and loading fused kernels ... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module fused_mix_prec_layer_norm_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module fused_mix_prec_layer_norm_cuda... [default0]:>>> done with compiling and loading fused kernels. Compilation time: 7.223 seconds [default0]:time to initialize megatron (seconds): 19.977 [default0]:[after megatron is initialized] datetime: 2022-09-03 18:57:55 [default0]:building GPT model ... [default0]:[2022-09-03 18:57:56,020] [INFO] [utils.py:827:see_memory_usage] Before Building Model [default0]:[2022-09-03 18:57:56,021] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [default0]:[2022-09-03 18:57:56,021] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 35.96 GB, percent = 7.1% [default0]:SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None [default0]:Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=1, model=0): 5, ProcessCoord(pipe=1, data=2, model=0): 6, ProcessCoord(pipe=1, data=3, model=0): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=1, model=0): 9, ProcessCoord(pipe=2, data=2, model=0): 10, ProcessCoord(pipe=2, data=3, model=0): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=1, model=0): 13, ProcessCoord(pipe=3, data=2, model=0): 14, ProcessCoord(pipe=3, data=3, model=0): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=1, model=0): 17, ProcessCoord(pipe=4, data=2, model=0): 18, ProcessCoord(pipe=4, data=3, model=0): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=1, model=0): 21, ProcessCoord(pipe=5, data=2, model=0): 22, ProcessCoord(pipe=5, data=3, model=0): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=1, model=0): 25, ProcessCoord(pipe=6, data=2, model=0): 26, ProcessCoord(pipe=6, data=3, model=0): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=1, model=0): 29, ProcessCoord(pipe=7, data=2, model=0): 30, ProcessCoord(pipe=7, data=3, model=0): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=1, model=0): 33, ProcessCoord(pipe=8, data=2, model=0): 34, ProcessCoord(pipe=8, data=3, model=0): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=1, model=0): 37, ProcessCoord(pipe=9, data=2, model=0): 38, ProcessCoord(pipe=9, data=3, model=0): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=1, model=0): 41, ProcessCoord(pipe=10, data=2, model=0): 42, ProcessCoord(pipe=10, data=3, model=0): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=1, model=0): 45, ProcessCoord(pipe=11, data=2, model=0): 46, ProcessCoord(pipe=11, data=3, model=0): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=1, model=0): 49, ProcessCoord(pipe=12, data=2, model=0): 50, ProcessCoord(pipe=12, data=3, model=0): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=1, model=0): 53, ProcessCoord(pipe=13, data=2, model=0): 54, ProcessCoord(pipe=13, data=3, model=0): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=1, model=0): 57, ProcessCoord(pipe=14, data=2, model=0): 58, ProcessCoord(pipe=14, data=3, model=0): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=1, model=0): 61, ProcessCoord(pipe=15, data=2, model=0): 62, ProcessCoord(pipe=15, data=3, model=0): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=1, model=0): 65, ProcessCoord(pipe=16, data=2, model=0): 66, ProcessCoord(pipe=16, data=3, model=0): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=1, model=0): 69, ProcessCoord(pipe=17, data=2, model=0): 70, ProcessCoord(pipe=17, data=3, model=0): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=1, model=0): 73, ProcessCoord(pipe=18, data=2, model=0): 74, ProcessCoord(pipe=18, data=3, model=0): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=1, model=0): 77, ProcessCoord(pipe=19, data=2, model=0): 78, ProcessCoord(pipe=19, data=3, model=0): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=1, model=0): 81, ProcessCoord(pipe=20, data=2, model=0): 82, ProcessCoord(pipe=20, data=3, model=0): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=1, model=0): 85, ProcessCoord(pipe=21, data=2, model=0): 86, ProcessCoord(pipe=21, data=3, model=0): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=1, model=0): 89, ProcessCoord(pipe=22, data=2, model=0): 90, ProcessCoord(pipe=22, data=3, model=0): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=1, model=0): 93, ProcessCoord(pipe=23, data=2, model=0): 94, ProcessCoord(pipe=23, data=3, model=0): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=1, model=0): 97, ProcessCoord(pipe=24, data=2, model=0): 98, ProcessCoord(pipe=24, data=3, model=0): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=1, model=0): 101, ProcessCoord(pipe=25, data=2, model=0): 102, ProcessCoord(pipe=25, data=3, model=0): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=1, model=0): 105, ProcessCoord(pipe=26, data=2, model=0): 106, ProcessCoord(pipe=26, data=3, model=0): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=1, model=0): 109, ProcessCoord(pipe=27, data=2, model=0): 110, ProcessCoord(pipe=27, data=3, model=0): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=1, model=0): 113, ProcessCoord(pipe=28, data=2, model=0): 114, ProcessCoord(pipe=28, data=3, model=0): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=1, model=0): 117, ProcessCoord(pipe=29, data=2, model=0): 118, ProcessCoord(pipe=29, data=3, model=0): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=1, model=0): 121, ProcessCoord(pipe=30, data=2, model=0): 122, ProcessCoord(pipe=30, data=3, model=0): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=1, model=0): 125, ProcessCoord(pipe=31, data=2, model=0): 126, ProcessCoord(pipe=31, data=3, model=0): 127, ProcessCoord(pipe=32, data=0, model=0): 128, ProcessCoord(pipe=32, data=1, model=0): 129, ProcessCoord(pipe=32, data=2, model=0): 130, ProcessCoord(pipe=32, data=3, model=0): 131, ProcessCoord(pipe=33, data=0, model=0): 132, ProcessCoord(pipe=33, data=1, model=0): 133, ProcessCoord(pipe=33, data=2, model=0): 134, ProcessCoord(pipe=33, data=3, model=0): 135, ProcessCoord(pipe=34, data=0, model=0): 136, ProcessCoord(pipe=34, data=1, model=0): 137, ProcessCoord(pipe=34, data=2, model=0): 138, ProcessCoord(pipe=34, data=3, model=0): 139, ProcessCoord(pipe=35, data=0, model=0): 140, ProcessCoord(pipe=35, data=1, model=0): 141, ProcessCoord(pipe=35, data=2, model=0): 142, ProcessCoord(pipe=35, data=3, model=0): 143, ProcessCoord(pipe=36, data=0, model=0): 144, ProcessCoord(pipe=36, data=1, model=0): 145, ProcessCoord(pipe=36, data=2, model=0): 146, ProcessCoord(pipe=36, data=3, model=0): 147, ProcessCoord(pipe=37, data=0, model=0): 148, ProcessCoord(pipe=37, data=1, model=0): 149, ProcessCoord(pipe=37, data=2, model=0): 150, ProcessCoord(pipe=37, data=3, model=0): 151, ProcessCoord(pipe=38, data=0, model=0): 152, ProcessCoord(pipe=38, data=1, model=0): 153, ProcessCoord(pipe=38, data=2, model=0): 154, ProcessCoord(pipe=38, data=3, model=0): 155, ProcessCoord(pipe=39, data=0, model=0): 156, ProcessCoord(pipe=39, data=1, model=0): 157, ProcessCoord(pipe=39, data=2, model=0): 158, ProcessCoord(pipe=39, data=3, model=0): 159, ProcessCoord(pipe=40, data=0, model=0): 160, ProcessCoord(pipe=40, data=1, model=0): 161, ProcessCoord(pipe=40, data=2, model=0): 162, ProcessCoord(pipe=40, data=3, model=0): 163, ProcessCoord(pipe=41, data=0, model=0): 164, ProcessCoord(pipe=41, data=1, model=0): 165, ProcessCoord(pipe=41, data=2, model=0): 166, ProcessCoord(pipe=41, data=3, model=0): 167, ProcessCoord(pipe=42, data=0, model=0): 168, ProcessCoord(pipe=42, data=1, model=0): 169, ProcessCoord(pipe=42, data=2, model=0): 170, ProcessCoord(pipe=42, data=3, model=0): 171, ProcessCoord(pipe=43, data=0, model=0): 172, ProcessCoord(pipe=43, data=1, model=0): 173, ProcessCoord(pipe=43, data=2, model=0): 174, ProcessCoord(pipe=43, data=3, model=0): 175, ProcessCoord(pipe=44, data=0, model=0): 176, ProcessCoord(pipe=44, data=1, model=0): 177, ProcessCoord(pipe=44, data=2, model=0): 178, ProcessCoord(pipe=44, data=3, model=0): 179, ProcessCoord(pipe=45, data=0, model=0): 180, ProcessCoord(pipe=45, data=1, model=0): 181, ProcessCoord(pipe=45, data=2, model=0): 182, ProcessCoord(pipe=45, data=3, model=0): 183, ProcessCoord(pipe=46, data=0, model=0): 184, ProcessCoord(pipe=46, data=1, model=0): 185, ProcessCoord(pipe=46, data=2, model=0): 186, ProcessCoord(pipe=46, data=3, model=0): 187, ProcessCoord(pipe=47, data=0, model=0): 188, ProcessCoord(pipe=47, data=1, model=0): 189, ProcessCoord(pipe=47, data=2, model=0): 190, ProcessCoord(pipe=47, data=3, model=0): 191, ProcessCoord(pipe=48, data=0, model=0): 192, ProcessCoord(pipe=48, data=1, model=0): 193, ProcessCoord(pipe=48, data=2, model=0): 194, ProcessCoord(pipe=48, data=3, model=0): 195, ProcessCoord(pipe=49, data=0, model=0): 196, ProcessCoord(pipe=49, data=1, model=0): 197, ProcessCoord(pipe=49, data=2, model=0): 198, ProcessCoord(pipe=49, data=3, model=0): 199, ProcessCoord(pipe=50, data=0, model=0): 200, ProcessCoord(pipe=50, data=1, model=0): 201, ProcessCoord(pipe=50, data=2, model=0): 202, ProcessCoord(pipe=50, data=3, model=0): 203, ProcessCoord(pipe=51, data=0, model=0): 204, ProcessCoord(pipe=51, data=1, model=0): 205, ProcessCoord(pipe=51, data=2, model=0): 206, ProcessCoord(pipe=51, data=3, model=0): 207, ProcessCoord(pipe=52, data=0, model=0): 208, ProcessCoord(pipe=52, data=1, model=0): 209, ProcessCoord(pipe=52, data=2, model=0): 210, ProcessCoord(pipe=52, data=3, model=0): 211, ProcessCoord(pipe=53, data=0, model=0): 212, ProcessCoord(pipe=53, data=1, model=0): 213, ProcessCoord(pipe=53, data=2, model=0): 214, ProcessCoord(pipe=53, data=3, model=0): 215, ProcessCoord(pipe=54, data=0, model=0): 216, ProcessCoord(pipe=54, data=1, model=0): 217, ProcessCoord(pipe=54, data=2, model=0): 218, ProcessCoord(pipe=54, data=3, model=0): 219, ProcessCoord(pipe=55, data=0, model=0): 220, ProcessCoord(pipe=55, data=1, model=0): 221, ProcessCoord(pipe=55, data=2, model=0): 222, ProcessCoord(pipe=55, data=3, model=0): 223, ProcessCoord(pipe=56, data=0, model=0): 224, ProcessCoord(pipe=56, data=1, model=0): 225, ProcessCoord(pipe=56, data=2, model=0): 226, ProcessCoord(pipe=56, data=3, model=0): 227, ProcessCoord(pipe=57, data=0, model=0): 228, ProcessCoord(pipe=57, data=1, model=0): 229, ProcessCoord(pipe=57, data=2, model=0): 230, ProcessCoord(pipe=57, data=3, model=0): 231, ProcessCoord(pipe=58, data=0, model=0): 232, ProcessCoord(pipe=58, data=1, model=0): 233, ProcessCoord(pipe=58, data=2, model=0): 234, ProcessCoord(pipe=58, data=3, model=0): 235, ProcessCoord(pipe=59, data=0, model=0): 236, ProcessCoord(pipe=59, data=1, model=0): 237, ProcessCoord(pipe=59, data=2, model=0): 238, ProcessCoord(pipe=59, data=3, model=0): 239, ProcessCoord(pipe=60, data=0, model=0): 240, ProcessCoord(pipe=60, data=1, model=0): 241, ProcessCoord(pipe=60, data=2, model=0): 242, ProcessCoord(pipe=60, data=3, model=0): 243, ProcessCoord(pipe=61, data=0, model=0): 244, ProcessCoord(pipe=61, data=1, model=0): 245, ProcessCoord(pipe=61, data=2, model=0): 246, ProcessCoord(pipe=61, data=3, model=0): 247, ProcessCoord(pipe=62, data=0, model=0): 248, ProcessCoord(pipe=62, data=1, model=0): 249, ProcessCoord(pipe=62, data=2, model=0): 250, ProcessCoord(pipe=62, data=3, model=0): 251, ProcessCoord(pipe=63, data=0, model=0): 252, ProcessCoord(pipe=63, data=1, model=0): 253, ProcessCoord(pipe=63, data=2, model=0): 254, ProcessCoord(pipe=63, data=3, model=0): 255, ProcessCoord(pipe=64, data=0, model=0): 256, ProcessCoord(pipe=64, data=1, model=0): 257, ProcessCoord(pipe=64, data=2, model=0): 258, ProcessCoord(pipe=64, data=3, model=0): 259, ProcessCoord(pipe=65, data=0, model=0): 260, ProcessCoord(pipe=65, data=1, model=0): 261, ProcessCoord(pipe=65, data=2, model=0): 262, ProcessCoord(pipe=65, data=3, model=0): 263, ProcessCoord(pipe=66, data=0, model=0): 264, ProcessCoord(pipe=66, data=1, model=0): 265, ProcessCoord(pipe=66, data=2, model=0): 266, ProcessCoord(pipe=66, data=3, model=0): 267, ProcessCoord(pipe=67, data=0, model=0): 268, ProcessCoord(pipe=67, data=1, model=0): 269, ProcessCoord(pipe=67, data=2, model=0): 270, ProcessCoord(pipe=67, data=3, model=0): 271, ProcessCoord(pipe=68, data=0, model=0): 272, ProcessCoord(pipe=68, data=1, model=0): 273, ProcessCoord(pipe=68, data=2, model=0): 274, ProcessCoord(pipe=68, data=3, model=0): 275, ProcessCoord(pipe=69, data=0, model=0): 276, ProcessCoord(pipe=69, data=1, model=0): 277, ProcessCoord(pipe=69, data=2, model=0): 278, ProcessCoord(pipe=69, data=3, model=0): 279, ProcessCoord(pipe=70, data=0, model=0): 280, ProcessCoord(pipe=70, data=1, model=0): 281, ProcessCoord(pipe=70, data=2, model=0): 282, ProcessCoord(pipe=70, data=3, model=0): 283, ProcessCoord(pipe=71, data=0, model=0): 284, ProcessCoord(pipe=71, data=1, model=0): 285, ProcessCoord(pipe=71, data=2, model=0): 286, ProcessCoord(pipe=71, data=3, model=0): 287} [default0]:[2022-09-03 18:57:59,935] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer|embedding [default0]:stage=0 layers=3 [default0]: 0: _to_float16 [default0]: 1: EmbeddingPipe [default0]: 2: [default0]:stage=1 layers=1 [default0]: 3: ParallelTransformerLayerPipe [default0]:stage=2 layers=1 [default0]: 4: ParallelTransformerLayerPipe [default0]:stage=3 layers=1 [default0]: 5: ParallelTransformerLayerPipe [default0]:stage=4 layers=1 [default0]: 6: ParallelTransformerLayerPipe [default0]:stage=5 layers=1 [default0]: 7: ParallelTransformerLayerPipe [default0]:stage=6 layers=1 [default0]: 8: ParallelTransformerLayerPipe [default0]:stage=7 layers=1 [default0]: 9: ParallelTransformerLayerPipe [default0]:stage=8 layers=1 [default0]: 10: ParallelTransformerLayerPipe [default0]:stage=9 layers=1 [default0]: 11: ParallelTransformerLayerPipe [default0]:stage=10 layers=1 [default0]: 12: ParallelTransformerLayerPipe [default0]:stage=11 layers=1 [default0]: 13: ParallelTransformerLayerPipe [default0]:stage=12 layers=1 [default0]: 14: ParallelTransformerLayerPipe [default0]:stage=13 layers=1 [default0]: 15: ParallelTransformerLayerPipe [default0]:stage=14 layers=1 [default0]: 16: ParallelTransformerLayerPipe [default0]:stage=15 layers=1 [default0]: 17: ParallelTransformerLayerPipe [default0]:stage=16 layers=1 [default0]: 18: ParallelTransformerLayerPipe [default0]:stage=17 layers=1 [default0]: 19: ParallelTransformerLayerPipe [default0]:stage=18 layers=1 [default0]: 20: ParallelTransformerLayerPipe [default0]:stage=19 layers=1 [default0]: 21: ParallelTransformerLayerPipe [default0]:stage=20 layers=1 [default0]: 22: ParallelTransformerLayerPipe [default0]:stage=21 layers=1 [default0]: 23: ParallelTransformerLayerPipe [default0]:stage=22 layers=1 [default0]: 24: ParallelTransformerLayerPipe [default0]:stage=23 layers=1 [default0]: 25: ParallelTransformerLayerPipe [default0]:stage=24 layers=1 [default0]: 26: ParallelTransformerLayerPipe [default0]:stage=25 layers=1 [default0]: 27: ParallelTransformerLayerPipe [default0]:stage=26 layers=1 [default0]: 28: ParallelTransformerLayerPipe [default0]:stage=27 layers=1 [default0]: 29: ParallelTransformerLayerPipe [default0]:stage=28 layers=1 [default0]: 30: ParallelTransformerLayerPipe [default0]:stage=29 layers=1 [default0]: 31: ParallelTransformerLayerPipe [default0]:stage=30 layers=1 [default0]: 32: ParallelTransformerLayerPipe [default0]:stage=31 layers=1 [default0]: 33: ParallelTransformerLayerPipe [default0]:stage=32 layers=1 [default0]: 34: ParallelTransformerLayerPipe [default0]:stage=33 layers=1 [default0]: 35: ParallelTransformerLayerPipe [default0]:stage=34 layers=1 [default0]: 36: ParallelTransformerLayerPipe [default0]:stage=35 layers=1 [default0]: 37: ParallelTransformerLayerPipe [default0]:stage=36 layers=1 [default0]: 38: ParallelTransformerLayerPipe [default0]:stage=37 layers=1 [default0]: 39: ParallelTransformerLayerPipe [default0]:stage=38 layers=1 [default0]: 40: ParallelTransformerLayerPipe [default0]:stage=39 layers=1 [default0]: 41: ParallelTransformerLayerPipe [default0]:stage=40 layers=1 [default0]: 42: ParallelTransformerLayerPipe [default0]:stage=41 layers=1 [default0]: 43: ParallelTransformerLayerPipe [default0]:stage=42 layers=1 [default0]: 44: ParallelTransformerLayerPipe [default0]:stage=43 layers=1 [default0]: 45: ParallelTransformerLayerPipe [default0]:stage=44 layers=1 [default0]: 46: ParallelTransformerLayerPipe [default0]:stage=45 layers=1 [default0]: 47: ParallelTransformerLayerPipe [default0]:stage=46 layers=1 [default0]: 48: ParallelTransformerLayerPipe [default0]:stage=47 layers=1 [default0]: 49: ParallelTransformerLayerPipe [default0]:stage=48 layers=1 [default0]: 50: ParallelTransformerLayerPipe [default0]:stage=49 layers=1 [default0]: 51: ParallelTransformerLayerPipe [default0]:stage=50 layers=1 [default0]: 52: ParallelTransformerLayerPipe [default0]:stage=51 layers=1 [default0]: 53: ParallelTransformerLayerPipe [default0]:stage=52 layers=1 [default0]: 54: ParallelTransformerLayerPipe [default0]:stage=53 layers=1 [default0]: 55: ParallelTransformerLayerPipe [default0]:stage=54 layers=1 [default0]: 56: ParallelTransformerLayerPipe [default0]:stage=55 layers=1 [default0]: 57: ParallelTransformerLayerPipe [default0]:stage=56 layers=1 [default0]: 58: ParallelTransformerLayerPipe [default0]:stage=57 layers=1 [default0]: 59: ParallelTransformerLayerPipe [default0]:stage=58 layers=1 [default0]: 60: ParallelTransformerLayerPipe [default0]:stage=59 layers=1 [default0]: 61: ParallelTransformerLayerPipe [default0]:stage=60 layers=1 [default0]: 62: ParallelTransformerLayerPipe [default0]:stage=61 layers=1 [default0]: 63: ParallelTransformerLayerPipe [default0]:stage=62 layers=1 [default0]: 64: ParallelTransformerLayerPipe [default0]:stage=63 layers=1 [default0]: 65: ParallelTransformerLayerPipe [default0]:stage=64 layers=1 [default0]: 66: ParallelTransformerLayerPipe [default0]:stage=65 layers=1 [default0]: 67: ParallelTransformerLayerPipe [default0]:stage=66 layers=1 [default0]: 68: ParallelTransformerLayerPipe [default0]:stage=67 layers=1 [default0]: 69: ParallelTransformerLayerPipe [default0]:stage=68 layers=1 [default0]: 70: ParallelTransformerLayerPipe [default0]:stage=69 layers=1 [default0]: 71: ParallelTransformerLayerPipe [default0]:stage=70 layers=3 [default0]: 72: ParallelTransformerLayerPipe [default0]: 73: undo [default0]: 74: MixedFusedLayerNorm [default0]:stage=71 layers=2 [default0]: 75: EmbeddingPipe [default0]: 76: float16_to_fp32 [default0]: loss: CrossEntropy [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default4]:Building extension module utils... [default4]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default4]:ninja: no work to do. [default4]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.24298810958862305 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.24299168586730957 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.24298310279846191 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.24298572540283203 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3274855613708496 seconds [default4]:Time to load utils op: 0.28615546226501465 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005102157592773438 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3269314765930176 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.32715439796447754 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0019278526306152344 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0018017292022705078 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0018184185028076172 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.001943349838256836 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00047779083251953125 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00048041343688964844 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004627704620361328 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Loading extension module utils... [default0]:Loading extension module utils... [default2]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10512351989746094 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10518717765808105 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10499691963195801 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.10223627090454102 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.10244178771972656 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.1028897762298584 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.10938310623168945 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.1089169979095459 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.10280942916870117 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.1029517650604248 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.1025688648223877 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.10900044441223145 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.1032111644744873 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.10512328147888184 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.10274744033813477 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20587992668151855 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21000194549560547 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.10242819786071777 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20952296257019043 seconds [default3]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20965313911437988 seconds [default3]:Time to load utils op: 0.2048795223236084 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21104025840759277 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20548248291015625 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10320830345153809 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.1022481918334961 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10225296020507812 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.10528779029846191 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default0]:Building extension module utils... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21150994300842285 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10228276252746582 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2048795223236084 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10244035720825195 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10303544998168945 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10250639915466309 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10228514671325684 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10284686088562012 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10243535041809082 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10236811637878418 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10386157035827637 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10425043106079102 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.1037745475769043 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10270166397094727 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10265398025512695 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.1043553352355957 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10297846794128418 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2055659294128418 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20607566833496094 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20501923561096191 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.1029503345489502 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10234975814819336 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10252761840820312 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20499181747436523 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10289287567138672 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10218691825866699 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10252618789672852 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.1023714542388916 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10216307640075684 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10238313674926758 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10239338874816895 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10287308692932129 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10250210762023926 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20313501358032227 seconds [default0]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2038724422454834 seconds [default0]:Time to load utils op: 0.20439553260803223 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20337986946105957 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.10296249389648438 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.1029520034790039 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10243558883666992 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.10392189025878906 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10226964950561523 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.10350394248962402 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.1042482852935791 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.1025092601776123 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.10245633125305176 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.10237669944763184 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.10252499580383301 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10283589363098145 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20981550216674805 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20280098915100098 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20980620384216309 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20257782936096191 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20732522010803223 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20287013053894043 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2025904655456543 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20674347877502441 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20659136772155762 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20672178268432617 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21061325073242188 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20992612838745117 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2025442123413086 seconds [default7]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20395183563232422 seconds [default7]:Time to load utils op: 0.20273160934448242 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2027757167816162 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20264649391174316 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20260310173034668 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20244169235229492 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20469284057617188 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2026371955871582 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20261073112487793 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20241475105285645 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20286989212036133 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2041018009185791 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.10252952575683594 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20385479927062988 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20271587371826172 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.10213112831115723 seconds [default4]:Time to load utils op: 0.10996532440185547 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10228562355041504 seconds [default0]:Time to load utils op: 0.10386180877685547 seconds [default6]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.10264945030212402 seconds [default1]:Time to load utils op: 0.10343694686889648 seconds [default0]:Loading extension module utils... [default6]:Time to load utils op: 0.10242938995361328 seconds [default0]:Time to load utils op: 0.20649290084838867 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2067563533782959 seconds [default0]:[2022-09-03 18:58:01,677] [INFO] [utils.py:827:see_memory_usage] After Building Model [default0]:[2022-09-03 18:58:01,677] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 18:58:01,677] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.35 GB, percent = 7.2% [default0]:setting training iterations to 3100 [default0]:> learning rate decay style: constant [default0]:DeepSpeed is enabled. [default0]:[2022-09-03 18:58:01,678] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.1+8b2a6371, git-hash=8b2a6371, git-branch=master [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10231876373291016 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10241508483886719 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10239076614379883 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20647573471069336 seconds [default2]:Time to load utils op: 0.10268163681030273 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2064826488494873 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2024211883544922 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20266246795654297 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2126758098602295 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2031996250152588 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20395779609680176 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20670080184936523 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20264077186584473 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.202622652053833 seconds [default5]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20337867736816406 seconds [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0015444755554199219 seconds [default5]:Time to load utils op: 0.20406007766723633 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2035684585571289 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20247483253479004 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2039353847503662 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0017545223236083984 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0016465187072753906 seconds [default7]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20237135887145996 seconds [default7]:Time to load utils op: 0.20242571830749512 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2023475170135498 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2042531967163086 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20434355735778809 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2026517391204834 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20499229431152344 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20374298095703125 seconds [default3]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2045440673828125 seconds [default3]:Time to load utils op: 0.20387506484985352 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20392775535583496 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20242905616760254 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2042829990386963 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20365500450134277 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20496273040771484 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2036275863647461 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2121584415435791 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005609989166259766 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006787776947021484 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20366930961608887 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20444798469543457 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20361661911010742 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20238399505615234 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20252108573913574 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2025012969970703 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20273756980895996 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20256257057189941 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30501389503479004 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10446834564208984 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004658699035644531 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20233440399169922 seconds [default7]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20259690284729004 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2028636932373047 seconds [default7]:Time to load utils op: 0.20244598388671875 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20378708839416504 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20254826545715332 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2118990421295166 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003364086151123047 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20273518562316895 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20468497276306152 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20267629623413086 seconds [default2]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20291471481323242 seconds [default2]:Time to load utils op: 0.20245862007141113 seconds [default5]:Loading extension module utils... [default6]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20446038246154785 seconds [default6]:Time to load utils op: 0.20670294761657715 seconds [default5]:Time to load utils op: 0.20694470405578613 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20235204696655273 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30531859397888184 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2023465633392334 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20234942436218262 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.21199893951416016 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20213747024536133 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2022404670715332 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30472898483276367 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20257282257080078 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20691537857055664 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10489034652709961 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20254135131835938 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20258426666259766 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.10366702079772949 seconds [default0]:Time to load utils op: 0.0005497932434082031 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10408449172973633 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20259404182434082 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.10292243957519531 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10390782356262207 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0016117095947265625 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.10285592079162598 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20235753059387207 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2039966583251953 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20247173309326172 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2031257152557373 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0016436576843261719 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.1024618148803711 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0015072822570800781 seconds [default6]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Time to load utils op: 0.20304489135742188 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.001669168472290039 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.10244584083557129 seconds [default4]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20262765884399414 seconds [default4]:Time to load utils op: 0.20287418365478516 seconds [default5]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.10242795944213867 seconds [default5]:Time to load utils op: 0.20274639129638672 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20244717597961426 seconds [default7]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2024831771850586 seconds [default7]:Time to load utils op: 0.1025385856628418 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20262765884399414 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.10319185256958008 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0010004043579101562 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20362448692321777 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0011777877807617188 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20231962203979492 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20327234268188477 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20294594764709473 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2032012939453125 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20326948165893555 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20307612419128418 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20368266105651855 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20366334915161133 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2034289836883545 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20408344268798828 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20265483856201172 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20250797271728516 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20256543159484863 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20264315605163574 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20308136940002441 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20424604415893555 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2033083438873291 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20363736152648926 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2034454345703125 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2026526927947998 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20257329940795898 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20384883880615234 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20253610610961914 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2043302059173584 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20378780364990234 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20256781578063965 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2027883529663086 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20245122909545898 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2027604579925537 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20270609855651855 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009298324584960938 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.001050710678100586 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20439743995666504 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20359063148498535 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2025740146636963 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2043135166168213 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20401334762573242 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2039940357208252 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20367908477783203 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2029106616973877 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20368313789367676 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2033536434173584 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2033226490020752 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20420002937316895 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20323753356933594 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2031691074371338 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2037370204925537 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2031996250152588 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2027902603149414 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20185256004333496 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20258545875549316 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20249533653259277 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20251965522766113 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20361685752868652 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20393085479736328 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2042396068572998 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.202711820602417 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2027595043182373 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20412302017211914 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2042100429534912 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20274686813354492 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2024059295654297 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0015673637390136719 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20387506484985352 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00040221214294433594 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00043582916259765625 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20371413230895996 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.000335693359375 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00042748451232910156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0014863014221191406 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004885196685791016 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004620552062988281 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005848407745361328 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00045943260192871094 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00036454200744628906 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004360675811767578 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004494190216064453 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004286766052246094 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006213188171386719 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.000469207763671875 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005278587341308594 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006585121154785156 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00048041343688964844 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004506111145019531 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005209445953369141 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005183219909667969 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Time to load utils op: 0.0005414485931396484 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005350112915039062 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.000476837158203125 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005323886871337891 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005338191986083984 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0003440380096435547 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005254745483398438 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004677772521972656 seconds [default3]:Time to load utils op: 0.0005996227264404297 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00045990943908691406 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007469654083251953 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005385875701904297 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005753040313720703 seconds [default5]:Time to load utils op: 0.0005598068237304688 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005593299865722656 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007114410400390625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006177425384521484 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.000522613525390625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004878044128417969 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005009174346923828 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004475116729736328 seconds [default0]:Time to load utils op: 0.00041961669921875 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.000705718994140625 seconds [default0]:Time to load utils op: 0.0007197856903076172 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006995201110839844 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007555484771728516 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007970333099365234 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004208087921142578 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007603168487548828 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007758140563964844 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00040602684020996094 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0009949207305908203 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004374980926513672 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00047469139099121094 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0003790855407714844 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007307529449462891 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005998611450195312 seconds [default4]:Time to load utils op: 0.0004487037658691406 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003998279571533203 seconds [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004482269287109375 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004596710205078125 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.000469207763671875 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00044226646423339844 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003254413604736328 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005526542663574219 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004017353057861328 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0003750324249267578 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006372928619384766 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005216598510742188 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00039005279541015625 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004596710205078125 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007345676422119141 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00048542022705078125 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006418228149414062 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004878044128417969 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00048804283142089844 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00048422813415527344 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005209445953369141 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0003554821014404297 seconds [default2]:Loading extension module utils... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004286766052246094 seconds [default2]:Time to load utils op: 0.0004291534423828125 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004725456237792969 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Time to load utils op: 0.0005066394805908203 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004949569702148438 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00046062469482421875 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0003943443298339844 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004429817199707031 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005540847778320312 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005345344543457031 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004801750183105469 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004222393035888672 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007505416870117188 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00037932395935058594 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00041222572326660156 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Time to load utils op: 0.0003962516784667969 seconds [default3]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Time to load utils op: 0.00037860870361328125 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Time to load utils op: 0.0003790855407714844 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00041031837463378906 seconds [default3]:Time to load utils op: 0.0004329681396484375 seconds [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004127025604248047 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default1]:Time to load utils op: 0.0006303787231445312 seconds [default3]:Time to load utils op: 0.00038051605224609375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004668235778808594 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004968643188476562 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006241798400878906 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Time to load utils op: 0.0006811618804931641 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0003323554992675781 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004992485046386719 seconds [default5]:Time to load utils op: 0.0004527568817138672 seconds [default1]:Time to load utils op: 0.0003962516784667969 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00041174888610839844 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00035452842712402344 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00036716461181640625 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00034689903259277344 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0003616809844970703 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005292892456054688 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003731250762939453 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006384849548339844 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00041174888610839844 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00045561790466308594 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005660057067871094 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0008089542388916016 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007359981536865234 seconds [default5]:Time to load utils op: 0.0004410743713378906 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00061798095703125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005700588226318359 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00054168701171875 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006256103515625 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00048232078552246094 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007166862487792969 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00047278404235839844 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006277561187744141 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004127025604248047 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006439685821533203 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004165172576904297 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004508495330810547 seconds [default6]:Time to load utils op: 0.0004601478576660156 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00041961669921875 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00038552284240722656 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005567073822021484 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0003998279571533203 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004837512969970703 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006983280181884766 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004379749298095703 seconds [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005154609680175781 seconds [default7]:Time to load utils op: 0.0004177093505859375 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0003414154052734375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005486011505126953 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004401206970214844 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004429817199707031 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006115436553955078 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005834102630615234 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003921985626220703 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00041484832763671875 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00039076805114746094 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004830360412597656 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006945133209228516 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0012753009796142578 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0012400150299072266 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00036716461181640625 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005028247833251953 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004360675811767578 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00046825408935546875 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00044035911560058594 seconds [default7]:Time to load utils op: 0.00047326087951660156 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0003943443298339844 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004420280456542969 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007433891296386719 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004413127899169922 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004353523254394531 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0003981590270996094 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004372596740722656 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0003535747528076172 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00048160552978515625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004012584686279297 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005810260772705078 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006082057952880859 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008375644683837891 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004801750183105469 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005784034729003906 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005245208740234375 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00044465065002441406 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004971027374267578 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004253387451171875 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005030632019042969 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00042891502380371094 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005075931549072266 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005326271057128906 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00045371055603027344 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003459453582763672 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00045609474182128906 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00048041343688964844 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004105567932128906 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006570816040039062 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00048661231994628906 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004763603210449219 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005958080291748047 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005161762237548828 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00087738037109375 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005803108215332031 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004558563232421875 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00032973289489746094 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00047278404235839844 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004477500915527344 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00043845176696777344 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004761219024658203 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00039267539978027344 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004322528839111328 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00048422813415527344 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0009138584136962891 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007846355438232422 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007498264312744141 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00052642822265625 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006902217864990234 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00046181678771972656 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007538795471191406 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00040435791015625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00046372413635253906 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006406307220458984 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0003864765167236328 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0003819465637207031 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0003829002380371094 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004887580871582031 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005712509155273438 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005598068237304688 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005230903625488281 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00057220458984375 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004603862762451172 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005009174346923828 seconds [default1]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Time to load utils op: 0.0005669593811035156 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004277229309082031 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004398822784423828 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004513263702392578 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007205009460449219 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00044417381286621094 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00044465065002441406 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00044417381286621094 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005338191986083984 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005321502685546875 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004971027374267578 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004172325134277344 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00047326087951660156 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004558563232421875 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00043582916259765625 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004286766052246094 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004932880401611328 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00043773651123046875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0003483295440673828 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default5]:Time to load utils op: 0.0006442070007324219 seconds [default4]:Time to load utils op: 0.0007119178771972656 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006632804870605469 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005464553833007812 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006787776947021484 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005285739898681641 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-03 18:58:02,402] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [default0]:[2022-09-03 18:58:02,402] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer [default0]:[2022-09-03 18:58:02,402] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer [default0]:[2022-09-03 18:58:02,402] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = {basic_optimizer.__class__.__name__} [default0]:[2022-09-03 18:58:02,402] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-03 18:58:02,440] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer [default0]:[2022-09-03 18:58:02,440] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 18:58:02,441] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.51 GB, percent = 7.3% [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default4]:Building extension module utils... [default4]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3069629669189453 seconds [default4]:ninja: no work to do. [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2845642566680908 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.30457115173339844 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3045670986175537 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30431032180786133 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30663251876831055 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3065376281738281 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30257582664489746 seconds [default0]:[2022-09-03 18:58:02,769] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 [default0]:[2022-09-03 18:58:02,770] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 18:58:02,770] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.51 GB, percent = 7.3% [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004832744598388672 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004265308380126953 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0003402233123779297 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003943443298339844 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.033060550689697266 seconds [default0]:[2022-09-03 18:58:02,839] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 [default0]:[2022-09-03 18:58:02,839] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 18:58:02,839] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.51 GB, percent = 7.3% [default0]:[2022-09-03 18:58:02,863] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 [default0]:[2022-09-03 18:58:02,864] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 18:58:02,864] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.51 GB, percent = 7.3% [default0]:[2022-09-03 18:58:02,888] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 [default0]:[2022-09-03 18:58:02,889] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 18:58:02,889] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.51 GB, percent = 7.3% [default0]:[2022-09-03 18:58:02,912] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer [default0]:[2022-09-03 18:58:02,913] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 18:58:02,913] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.51 GB, percent = 7.3% [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.03411531448364258 seconds [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.03408241271972656 seconds [default0]:[2022-09-03 18:58:02,994] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer [default0]:[2022-09-03 18:58:02,995] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-03 18:58:02,995] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.51 GB, percent = 7.3% [default0]:[2022-09-03 18:58:03,028] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer [default0]:[2022-09-03 18:58:03,028] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-03 18:58:03,028] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.51 GB, percent = 7.3% [default0]:[2022-09-03 18:58:03,029] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [default0]:[2022-09-03 18:58:03,029] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler [default0]:[2022-09-03 18:58:03,029] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [default0]:[2022-09-03 18:58:03,029] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05, 2e-05], mom=[(0.9, 0.95), (0.9, 0.95)] [default0]:[2022-09-03 18:58:03,029] [INFO] [config.py:987:print] DeepSpeedEngine configuration: [default0]:[2022-09-03 18:58:03,029] [INFO] [config.py:991:print] activation_checkpointing_config { [default0]: "partition_activations": false, [default0]: "contiguous_memory_optimization": false, [default0]: "cpu_checkpointing": false, [default0]: "number_checkpoints": null, [default0]: "synchronize_checkpoint_boundary": false, [default0]: "profile": false [default0]:} [default0]:[2022-09-03 18:58:03,029] [INFO] [config.py:991:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [default0]:[2022-09-03 18:58:03,029] [INFO] [config.py:991:print] amp_enabled .................. False [default0]:[2022-09-03 18:58:03,029] [INFO] [config.py:991:print] amp_params ................... False [default0]:[2022-09-03 18:58:03,029] [INFO] [config.py:991:print] autotuning_config ............ { [default0]: "enabled": false, [default0]: "start_step": null, [default0]: "end_step": null, [default0]: "metric_path": null, [default0]: "arg_mappings": null, [default0]: "metric": "throughput", [default0]: "model_info": null, [default0]: "results_dir": null, [default0]: "exps_dir": null, [default0]: "overwrite": true, [default0]: "fast": true, [default0]: "start_profile_step": 3, [default0]: "end_profile_step": 5, [default0]: "tuner_type": "gridsearch", [default0]: "tuner_early_stopping": 5, [default0]: "tuner_num_trials": 50, [default0]: "model_info_path": null, [default0]: "mp_size": 1, [default0]: "max_train_batch_size": null, [default0]: "min_train_batch_size": 1, [default0]: "max_train_micro_batch_size_per_gpu": 1.024000e+03, [default0]: "min_train_micro_batch_size_per_gpu": 1, [default0]: "num_tuning_micro_batch_sizes": 3 [default0]:} [default0]:[2022-09-03 18:58:03,029] [INFO] [config.py:991:print] bfloat16_enabled ............. True [default0]:[2022-09-03 18:58:03,029] [INFO] [config.py:991:print] checkpoint_tag_validation_enabled True [default0]:[2022-09-03 18:58:03,029] [INFO] [config.py:991:print] checkpoint_tag_validation_fail False [default0]:[2022-09-03 18:58:03,029] [INFO] [config.py:991:print] comms_config ................. [default0]:[2022-09-03 18:58:03,029] [INFO] [config.py:991:print] communication_data_type ...... None [default0]:[2022-09-03 18:58:03,029] [INFO] [config.py:991:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] curriculum_enabled ........... False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] curriculum_params ............ False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] dataloader_drop_last ......... False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] disable_allgather ............ False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] dump_state ................... False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] dynamic_loss_scale_args ...... None [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] eigenvalue_enabled ........... False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] eigenvalue_gas_boundary_resolution 1 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] eigenvalue_layer_name ........ bert.encoder.layer [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] eigenvalue_layer_num ......... 0 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] eigenvalue_max_iter .......... 100 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] eigenvalue_stability ......... 1e-06 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] eigenvalue_tol ............... 0.01 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] eigenvalue_verbose ........... False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] elasticity_enabled ........... False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] flops_profiler_config ........ { [default0]: "enabled": false, [default0]: "profile_step": 1, [default0]: "module_depth": -1, [default0]: "top_modules": 1, [default0]: "detailed": true, [default0]: "output_file": null [default0]:} [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] fp16_auto_cast ............... None [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] fp16_enabled ................. False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] fp16_master_weights_and_gradients False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] global_rank .................. 0 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] gradient_accumulation_steps .. 512 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] gradient_clipping ............ 1.0 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] gradient_predivide_factor .... 1.0 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] initial_dynamic_scale ........ 1 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] load_universal_checkpoint .... True [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] loss_scale ................... 1.0 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] memory_breakdown ............. False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] monitor_config ............... [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] nebula_config ................ { [default0]: "enabled": false, [default0]: "persistent_storage_path": null, [default0]: "persistent_time_interval": 100, [default0]: "num_of_version_in_retention": 2, [default0]: "enable_nebula_load": true, [default0]: "load_path": null [default0]:} [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] optimizer_legacy_fusion ...... False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] optimizer_name ............... None [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] optimizer_params ............. None [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] pld_enabled .................. False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] pld_params ................... False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] prescale_gradients ........... False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] scheduler_name ............... None [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] scheduler_params ............. None [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] sparse_attention ............. None [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] sparse_gradients_enabled ..... False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] steps_per_print .............. 2000 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] train_batch_size ............. 2048 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] train_micro_batch_size_per_gpu 1 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] wall_clock_breakdown ......... False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] world_size ................... 4 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] zero_allow_untested_optimizer False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] zero_enabled ................. False [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:991:print] zero_optimization_stage ...... 0 [default0]:[2022-09-03 18:58:03,030] [INFO] [config.py:976:print_user_config] json = { [default0]: "train_micro_batch_size_per_gpu": 1, [default0]: "train_batch_size": 2.048000e+03, [default0]: "gradient_clipping": 1.0, [default0]: "zero_optimization": { [default0]: "stage": 0 [default0]: }, [default0]: "bf16": { [default0]: "enabled": true [default0]: }, [default0]: "steps_per_print": 2.000000e+03, [default0]: "wall_clock_breakdown": false, [default0]: "checkpoint": { [default0]: "load_universal": true [default0]: } [default0]:} [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004773139953613281 seconds [default0]:[2022-09-03 18:58:03,031] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=512 micro_batch_size=1 [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=144 STAGE=36 LAYERS=1 [38, 39) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=16 STAGE=4 LAYERS=1 [6, 7) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=148 STAGE=37 LAYERS=1 [39, 40) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=84 STAGE=21 LAYERS=1 [23, 24) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=244 STAGE=61 LAYERS=1 [63, 64) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=240 STAGE=60 LAYERS=1 [62, 63) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=284 STAGE=71 LAYERS=2 [75, 77) STAGE_PARAMS=3596615680 (3596.616M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=88 STAGE=22 LAYERS=1 [24, 25) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=224 STAGE=56 LAYERS=1 [58, 59) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=136 STAGE=34 LAYERS=1 [36, 37) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=92 STAGE=23 LAYERS=1 [25, 26) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=272 STAGE=68 LAYERS=1 [70, 71) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=56 STAGE=14 LAYERS=1 [16, 17) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=216 STAGE=54 LAYERS=1 [56, 57) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=220 STAGE=55 LAYERS=1 [57, 58) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=60 STAGE=15 LAYERS=1 [17, 18) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=160 STAGE=40 LAYERS=1 [42, 43) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=8 STAGE=2 LAYERS=1 [4, 5) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=164 STAGE=41 LAYERS=1 [43, 44) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=12 STAGE=3 LAYERS=1 [5, 6) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=256 STAGE=64 LAYERS=1 [66, 67) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=48 STAGE=12 LAYERS=1 [14, 15) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=52 STAGE=13 LAYERS=1 [15, 16) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=20 STAGE=5 LAYERS=1 [7, 8) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=132 STAGE=33 LAYERS=1 [35, 36) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=40 STAGE=10 LAYERS=1 [12, 13) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=128 STAGE=32 LAYERS=1 [34, 35) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=44 STAGE=11 LAYERS=1 [13, 14) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=236 STAGE=59 LAYERS=1 [61, 62) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=232 STAGE=58 LAYERS=1 [60, 61) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=104 STAGE=26 LAYERS=1 [28, 29) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=140 STAGE=35 LAYERS=1 [37, 38) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=120 STAGE=30 LAYERS=1 [32, 33) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=228 STAGE=57 LAYERS=1 [59, 60) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=260 STAGE=65 LAYERS=1 [67, 68) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=108 STAGE=27 LAYERS=1 [29, 30) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=264 STAGE=66 LAYERS=1 [68, 69) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=96 STAGE=24 LAYERS=1 [26, 27) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=168 STAGE=42 LAYERS=1 [44, 45) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=276 STAGE=69 LAYERS=1 [71, 72) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=72 STAGE=18 LAYERS=1 [20, 21) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=124 STAGE=31 LAYERS=1 [33, 34) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=36 STAGE=9 LAYERS=1 [11, 12) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=32 STAGE=8 LAYERS=1 [10, 11) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=76 STAGE=19 LAYERS=1 [21, 22) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=172 STAGE=43 LAYERS=1 [45, 46) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=100 STAGE=25 LAYERS=1 [27, 28) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=180 STAGE=45 LAYERS=1 [47, 48) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=176 STAGE=44 LAYERS=1 [46, 47) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=200 STAGE=50 LAYERS=1 [52, 53) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=204 STAGE=51 LAYERS=1 [53, 54) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=268 STAGE=67 LAYERS=1 [69, 70) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=4 STAGE=1 LAYERS=1 [3, 4) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=196 STAGE=49 LAYERS=1 [51, 52) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=192 STAGE=48 LAYERS=1 [50, 51) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=208 STAGE=52 LAYERS=1 [54, 55) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=212 STAGE=53 LAYERS=1 [55, 56) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=28 STAGE=7 LAYERS=1 [9, 10) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=64 STAGE=16 LAYERS=1 [18, 19) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=24 STAGE=6 LAYERS=1 [8, 9) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=68 STAGE=17 LAYERS=1 [19, 20) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=80 STAGE=20 LAYERS=1 [22, 23) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=248 STAGE=62 LAYERS=1 [64, 65) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=252 STAGE=63 LAYERS=1 [65, 66) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=188 STAGE=47 LAYERS=1 [49, 50) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=184 STAGE=46 LAYERS=1 [48, 49) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=116 STAGE=29 LAYERS=1 [31, 32) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=280 STAGE=70 LAYERS=3 [72, 75) STAGE_PARAMS=2466465792 (2466.466M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=152 STAGE=38 LAYERS=1 [40, 41) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,624] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=3 [0, 3) STAGE_PARAMS=3596644352 (3596.644M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=112 STAGE=28 LAYERS=1 [30, 31) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 18:58:03,625] [INFO] [engine.py:145:__init__] RANK=156 STAGE=39 LAYERS=1 [41, 42) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default1]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 18:58:04,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 18:58:04,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 18:58:04,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:04,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 18:58:04,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 18:58:12,591] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 163 [default1]:[2022-09-03 18:58:13,929] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 217 [default3]:[2022-09-03 18:58:13,908] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 219 [default0]:[2022-09-03 18:58:13,900] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 216 [default1]:[2022-09-03 18:58:14,021] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 161 [default0]:[2022-09-03 18:58:14,013] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 160 [default2]:[2022-09-03 18:58:14,021] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 218 [default7]:[2022-09-03 18:58:14,155] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 223 [default7]:[2022-09-03 18:58:14,641] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 167 [default3]:[2022-09-03 18:58:14,921] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 251 [default2]:[2022-09-03 18:58:14,918] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 162 [default4]:[2022-09-03 18:58:15,417] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 164 [default5]:[2022-09-03 18:58:15,424] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 165 [default6]:[2022-09-03 18:58:16,217] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 262 [default7]:[2022-09-03 18:58:16,215] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 263 [default5]:[2022-09-03 18:58:16,336] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 221 [default4]:[2022-09-03 18:58:16,320] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 220 [default3]:[2022-09-03 18:58:16,545] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 243 [default4]:[2022-09-03 18:58:16,512] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 260 [default5]:[2022-09-03 18:58:16,514] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 261 [default6]:[2022-09-03 18:58:16,600] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 166 [default6]:[2022-09-03 18:58:16,828] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 222 [default3]:[2022-09-03 18:58:17,165] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 235 [default7]:[2022-09-03 18:58:17,180] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 231 [default3]:[2022-09-03 18:58:17,248] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 19 [default2]:[2022-09-03 18:58:17,308] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 226 [default3]:[2022-09-03 18:58:17,310] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 227 [default7]:[2022-09-03 18:58:17,433] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 247 [default7]:[2022-09-03 18:58:17,528] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 175 [default6]:[2022-09-03 18:58:17,536] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 174 [default0]:[2022-09-03 18:58:17,732] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 248 [default3]:[2022-09-03 18:58:17,651] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 155 [default1]:[2022-09-03 18:58:17,743] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 249 [default3]:[2022-09-03 18:58:17,914] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 259 [default6]:[2022-09-03 18:58:18,061] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 254 [default7]:[2022-09-03 18:58:18,058] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 255 [default6]:[2022-09-03 18:58:18,053] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 230 [default7]:[2022-09-03 18:58:18,146] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 271 [default5]:[2022-09-03 18:58:18,144] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 253 [default4]:[2022-09-03 18:58:18,141] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 252 [default4]:[2022-09-03 18:58:18,333] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 20 [default5]:[2022-09-03 18:58:18,372] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 229 [default4]:[2022-09-03 18:58:18,371] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 228 [default3]:[2022-09-03 18:58:18,387] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 51 [default5]:[2022-09-03 18:58:18,382] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 21 [default3]:[2022-09-03 18:58:18,524] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 83 [default4]:[2022-09-03 18:58:18,539] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 244 [default0]:[2022-09-03 18:58:18,504] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 224 [default5]:[2022-09-03 18:58:18,548] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 245 [default1]:[2022-09-03 18:58:18,508] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 225 [default0]:[2022-09-03 18:58:18,553] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 16 [default7]:[2022-09-03 18:58:18,568] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 87 [default1]:[2022-09-03 18:58:18,572] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 17 [default7]:[2022-09-03 18:58:18,562] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 183 [default5]:[2022-09-03 18:58:18,567] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 173 [default4]:[2022-09-03 18:58:18,560] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 172 [default7]:[2022-09-03 18:58:18,809] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 79 [default2]:[2022-09-03 18:58:18,870] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 250 [default7]:[2022-09-03 18:58:18,885] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 63 [default3]:[2022-09-03 18:58:19,104] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 179 [default2]:[2022-09-03 18:58:19,107] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 178 [default3]:[2022-09-03 18:58:19,083] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 275 [default3]:[2022-09-03 18:58:19,085] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 99 [default2]:[2022-09-03 18:58:19,191] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 18 [default7]:[2022-09-03 18:58:19,265] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 23 [default7]:[2022-09-03 18:58:19,345] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 239 [default3]:[2022-09-03 18:58:19,378] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 67 [default6]:[2022-09-03 18:58:19,411] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 246 [default3]:[2022-09-03 18:58:19,377] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 171 [default2]:[2022-09-03 18:58:19,377] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 170 [default2]:[2022-09-03 18:58:19,526] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 242 [default1]:[2022-09-03 18:58:19,561] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 81 [default0]:[2022-09-03 18:58:19,543] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 80 [default3]:[2022-09-03 18:58:19,591] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 195 [default0]:[2022-09-03 18:58:19,733] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 144 [default3]:[2022-09-03 18:58:19,696] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 147 [default2]:[2022-09-03 18:58:19,685] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 146 [default1]:[2022-09-03 18:58:19,727] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 241 [default0]:[2022-09-03 18:58:19,727] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 240 [default7]:[2022-09-03 18:58:19,662] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 279 [default6]:[2022-09-03 18:58:19,696] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 102 [default7]:[2022-09-03 18:58:19,698] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 103 [default1]:[2022-09-03 18:58:19,748] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 145 [default6]:[2022-09-03 18:58:19,793] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 270 [default2]:[2022-09-03 18:58:20,011] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 154 [default7]:[2022-09-03 18:58:19,990] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 95 [default2]:[2022-09-03 18:58:20,140] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 82 [default4]:[2022-09-03 18:58:20,101] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 156 [default4]:[2022-09-03 18:58:20,067] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 148 [default7]:[2022-09-03 18:58:20,083] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 119 [default5]:[2022-09-03 18:58:20,076] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 149 [default0]:[2022-09-03 18:58:20,096] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 272 [default1]:[2022-09-03 18:58:20,102] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 273 [default7]:[2022-09-03 18:58:20,146] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 151 [default0]:[2022-09-03 18:58:20,148] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 256 [default7]:[2022-09-03 18:58:20,081] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 143 [default3]:[2022-09-03 18:58:20,147] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 131 [default2]:[2022-09-03 18:58:20,124] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 258 [default5]:[2022-09-03 18:58:20,106] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 157 [default3]:[2022-09-03 18:58:20,086] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 203 [default3]:[2022-09-03 18:58:20,231] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 59 [default1]:[2022-09-03 18:58:20,153] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 257 [default2]:[2022-09-03 18:58:20,253] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 34 [default3]:[2022-09-03 18:58:20,257] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 35 [default7]:[2022-09-03 18:58:20,264] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 159 [default6]:[2022-09-03 18:58:20,262] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 22 [default4]:[2022-09-03 18:58:20,300] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 236 [default0]:[2022-09-03 18:58:20,299] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 232 [default3]:[2022-09-03 18:58:20,266] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 43 [default1]:[2022-09-03 18:58:20,311] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 233 [default2]:[2022-09-03 18:58:20,347] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 50 [default3]:[2022-09-03 18:58:20,321] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 267 [default5]:[2022-09-03 18:58:20,287] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 237 [default7]:[2022-09-03 18:58:20,339] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 207 [default2]:[2022-09-03 18:58:20,325] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 266 [default7]:[2022-09-03 18:58:20,381] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 215 [default1]:[2022-09-03 18:58:20,373] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 169 [default0]:[2022-09-03 18:58:20,392] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 168 [default3]:[2022-09-03 18:58:20,511] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 283 [default7]:[2022-09-03 18:58:20,554] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 39 [default3]:[2022-09-03 18:58:20,628] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 115 [default3]:[2022-09-03 18:58:20,656] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 75 [default5]:[2022-09-03 18:58:20,587] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 181 [default4]:[2022-09-03 18:58:20,649] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 180 [default3]:[2022-09-03 18:58:20,670] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 187 [default3]:[2022-09-03 18:58:20,706] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 11 [default5]:[2022-09-03 18:58:20,677] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 61 [default7]:[2022-09-03 18:58:20,739] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 47 [default6]:[2022-09-03 18:58:20,667] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 78 [default5]:[2022-09-03 18:58:20,701] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 5 [default4]:[2022-09-03 18:58:20,706] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 4 [default4]:[2022-09-03 18:58:20,774] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 60 [default6]:[2022-09-03 18:58:20,845] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 150 [default6]:[2022-09-03 18:58:20,763] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 238 [default5]:[2022-09-03 18:58:20,770] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 277 [default4]:[2022-09-03 18:58:20,774] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 276 [default0]:[2022-09-03 18:58:20,794] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 176 [default1]:[2022-09-03 18:58:20,806] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 177 [default7]:[2022-09-03 18:58:20,842] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 31 [default6]:[2022-09-03 18:58:20,861] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 158 [default3]:[2022-09-03 18:58:20,906] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 139 [default2]:[2022-09-03 18:58:20,882] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 274 [default7]:[2022-09-03 18:58:20,939] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 111 [default2]:[2022-09-03 18:58:20,922] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 98 [default4]:[2022-09-03 18:58:21,005] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 116 [default5]:[2022-09-03 18:58:21,012] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 117 [default0]:[2022-09-03 18:58:21,000] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 152 [default1]:[2022-09-03 18:58:21,000] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 153 [default6]:[2022-09-03 18:58:21,001] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 62 [default2]:[2022-09-03 18:58:20,959] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 234 [default4]:[2022-09-03 18:58:20,956] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 140 [default5]:[2022-09-03 18:58:20,966] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 141 [default6]:[2022-09-03 18:58:21,006] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 134 [default7]:[2022-09-03 18:58:21,008] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 135 [default7]:[2022-09-03 18:58:21,060] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 7 [default2]:[2022-09-03 18:58:21,014] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 210 [default3]:[2022-09-03 18:58:21,025] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 211 [default4]:[2022-09-03 18:58:21,135] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 132 [default5]:[2022-09-03 18:58:21,134] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 133 [default6]:[2022-09-03 18:58:21,155] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 278 [default6]:[2022-09-03 18:58:21,068] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 142 [default0]:[2022-09-03 18:58:21,152] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 32 [default6]:[2022-09-03 18:58:21,160] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 182 [default4]:[2022-09-03 18:58:21,089] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 100 [default5]:[2022-09-03 18:58:21,085] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 101 [default7]:[2022-09-03 18:58:21,160] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 199 [default5]:[2022-09-03 18:58:21,234] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 85 [default3]:[2022-09-03 18:58:21,150] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 91 [default0]:[2022-09-03 18:58:21,230] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 136 [default1]:[2022-09-03 18:58:21,240] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 137 [default4]:[2022-09-03 18:58:21,224] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 84 [default1]:[2022-09-03 18:58:21,230] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 73 [default1]:[2022-09-03 18:58:21,161] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 33 [default0]:[2022-09-03 18:58:21,217] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 72 [default4]:[2022-09-03 18:58:21,282] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 212 [default5]:[2022-09-03 18:58:21,196] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 213 [default2]:[2022-09-03 18:58:21,329] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 114 [default6]:[2022-09-03 18:58:21,336] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 86 [default0]:[2022-09-03 18:58:21,301] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 264 [default6]:[2022-09-03 18:58:21,339] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 54 [default7]:[2022-09-03 18:58:21,355] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 55 [default1]:[2022-09-03 18:58:21,306] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 265 [default0]:[2022-09-03 18:58:21,314] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 208 [default1]:[2022-09-03 18:58:21,315] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 209 [default0]:[2022-09-03 18:58:21,383] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 112 [default1]:[2022-09-03 18:58:21,393] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 113 [default0]:[2022-09-03 18:58:21,448] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 96 [default1]:[2022-09-03 18:58:21,448] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 97 [default4]:[2022-09-03 18:58:21,435] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 268 [default5]:[2022-09-03 18:58:21,434] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 269 [default2]:[2022-09-03 18:58:21,539] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 138 [default0]:[2022-09-03 18:58:21,482] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 128 [default7]:[2022-09-03 18:58:21,528] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 15 [default1]:[2022-09-03 18:58:21,490] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 129 [default4]:[2022-09-03 18:58:21,534] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 76 [default5]:[2022-09-03 18:58:21,535] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 77 [default1]:[2022-09-03 18:58:21,582] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 41 [default0]:[2022-09-03 18:58:21,581] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 40 [default3]:[2022-09-03 18:58:21,629] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 107 [default2]:[2022-09-03 18:58:21,613] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 130 [default2]:[2022-09-03 18:58:21,565] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 74 [default0]:[2022-09-03 18:58:21,654] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 120 [default3]:[2022-09-03 18:58:21,710] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 27 [default6]:[2022-09-03 18:58:21,646] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 118 [default5]:[2022-09-03 18:58:21,740] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 205 [default4]:[2022-09-03 18:58:21,723] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 204 [default1]:[2022-09-03 18:58:21,679] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 121 [default3]:[2022-09-03 18:58:21,743] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 123 [default1]:[2022-09-03 18:58:21,839] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 281 [default7]:[2022-09-03 18:58:21,847] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 127 [default2]:[2022-09-03 18:58:21,864] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 66 [default0]:[2022-09-03 18:58:21,898] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 280 [default6]:[2022-09-03 18:58:21,878] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 46 [default6]:[2022-09-03 18:58:21,887] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 6 [default6]:[2022-09-03 18:58:21,938] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 214 [default0]:[2022-09-03 18:58:22,043] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 88 [default1]:[2022-09-03 18:58:22,043] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 89 [default2]:[2022-09-03 18:58:21,973] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 42 [default1]:[2022-09-03 18:58:22,048] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 49 [default4]:[2022-09-03 18:58:22,050] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 36 [default2]:[2022-09-03 18:58:21,990] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 202 [default0]:[2022-09-03 18:58:22,058] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 48 [default7]:[2022-09-03 18:58:22,114] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 191 [default4]:[2022-09-03 18:58:22,147] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 92 [default1]:[2022-09-03 18:58:22,068] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 57 [default0]:[2022-09-03 18:58:22,052] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 56 [default4]:[2022-09-03 18:58:22,128] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 44 [default5]:[2022-09-03 18:58:22,147] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 45 [default5]:[2022-09-03 18:58:22,074] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 37 [default0]:[2022-09-03 18:58:22,066] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 200 [default1]:[2022-09-03 18:58:22,064] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 201 [default4]:[2022-09-03 18:58:22,109] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 124 [default5]:[2022-09-03 18:58:22,067] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 125 [default0]:[2022-09-03 18:58:22,176] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 64 [default1]:[2022-09-03 18:58:22,184] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 65 [default2]:[2022-09-03 18:58:22,176] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 58 [default5]:[2022-09-03 18:58:22,152] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 93 [default1]:[2022-09-03 18:58:22,207] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 105 [default0]:[2022-09-03 18:58:22,199] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 104 [default4]:[2022-09-03 18:58:22,195] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 52 [default5]:[2022-09-03 18:58:22,196] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 53 [default6]:[2022-09-03 18:58:22,224] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 206 [default2]:[2022-09-03 18:58:22,194] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 194 [default6]:[2022-09-03 18:58:22,348] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 38 [default2]:[2022-09-03 18:58:22,352] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 122 [default4]:[2022-09-03 18:58:22,411] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 28 [default5]:[2022-09-03 18:58:22,432] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 29 [default5]:[2022-09-03 18:58:22,399] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 189 [default2]:[2022-09-03 18:58:22,413] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 90 [default4]:[2022-09-03 18:58:22,443] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 12 [default5]:[2022-09-03 18:58:22,389] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 13 [default6]:[2022-09-03 18:58:22,375] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 126 [default1]:[2022-09-03 18:58:22,496] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 25 [default0]:[2022-09-03 18:58:22,497] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 24 [default7]:[2022-09-03 18:58:22,537] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 71 [default6]:[2022-09-03 18:58:22,511] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 94 [default6]:[2022-09-03 18:58:22,498] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 110 [default2]:[2022-09-03 18:58:22,623] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 106 [default4]:[2022-09-03 18:58:22,632] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 188 [default1]:[2022-09-03 18:58:22,572] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 193 [default6]:[2022-09-03 18:58:22,594] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 198 [default0]:[2022-09-03 18:58:22,633] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 192 [default4]:[2022-09-03 18:58:22,653] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 68 [default6]:[2022-09-03 18:58:22,699] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 70 [default5]:[2022-09-03 18:58:22,659] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 69 [default0]:[2022-09-03 18:58:22,717] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 184 [default0]:[2022-09-03 18:58:22,727] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 8 [default5]:[2022-09-03 18:58:22,672] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 109 [default1]:[2022-09-03 18:58:22,744] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 185 [default4]:[2022-09-03 18:58:22,670] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 108 [default5]:[2022-09-03 18:58:22,737] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 197 [default4]:[2022-09-03 18:58:22,747] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 196 [default2]:[2022-09-03 18:58:22,825] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 26 [default6]:[2022-09-03 18:58:22,829] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 190 [default6]:[2022-09-03 18:58:22,785] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 14 [default1]:[2022-09-03 18:58:22,796] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 9 [default2]:[2022-09-03 18:58:22,777] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 282 [default2]:[2022-09-03 18:58:22,825] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 10 [default6]:[2022-09-03 18:58:22,851] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 30 [default2]:[2022-09-03 18:58:22,869] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 186 [default4]:[2022-09-03 18:58:26,576] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 284 [default2]:[2022-09-03 18:58:27,192] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 2 [default6]:[2022-09-03 18:58:27,280] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 286 [default7]:[2022-09-03 18:58:27,506] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 287 [default5]:[2022-09-03 18:58:27,537] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 285 [default3]:[2022-09-03 18:58:27,662] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 3 [default1]:[2022-09-03 18:58:28,092] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 1 [default0]:[2022-09-03 18:58:29,540] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 0 [default0]:could not find arguments in the checkpoint ... [default0]: checkpoint version 3.0 [default0]: successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq at iteration 0 [default7]:time (ms) | load-checkpoint: 25035.54 [default0]:estimated model parameters: 258.958393344 [default0]:estimated model parameters without embeddings: 0.002064384 [default0]:[after model, optimizer, and learning rate scheduler are built] datetime: 2022-09-03 18:58:29 [default0]:> building train, validation, and test datasets ... [default0]: > datasets target sizes (minimum size): [default0]: train: 6348800 [default0]: validation: 133120 [default0]: test: 10240 [default0]:> building train, validation, and test datasets for T0 ... [default0]: > building dataset index ... [default0]:/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/utils.py:365: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings [default0]: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.059147 seconds [default0]: number of documents: 90897616 [default0]: > dataset split: [default0]: train: [default0]: document indices in [0, 90897616) total of 90897616 documents [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.009922 seconds [default0]: number of documents: 90897616 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003458 seconds [default0]: number of documents: 90897616 [default0]: > loading doc-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_batch_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_shuffle_idx.npy [default0]: loaded indexed file in 0.032 seconds [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.094530 seconds [default0]: number of documents: 15234080 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [14472376, 15234080) total of 761704 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.125169 [default0]: using: [default0]: number of documents: 761704 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 221749 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.010130 [default0]: > building shuffle index with split [0, 221749) and [221749, 221749) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.007118 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_4424ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_4424ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_4424ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 221750 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.144886 seconds [default0]: number of documents: 6142390 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [5835270, 6142390) total of 307120 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.031020 [default0]: using: [default0]: number of documents: 307120 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 136142 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.020988 [default0]: > building shuffle index with split [0, 136142) and [136142, 136142) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.005057 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_1505ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_1505ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_1505ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.111 seconds [default0]: total number of samples: 136143 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.113362 seconds [default0]: number of documents: 26176998 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [24868148, 26176998) total of 1308850 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.045105 [default0]: using: [default0]: number of documents: 1308850 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 432310 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.011068 [default0]: > building shuffle index with split [0, 432310) and [432310, 432310) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.010439 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_17429ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_17429ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_17429ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.006 seconds [default0]: total number of samples: 432311 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.100260 seconds [default0]: number of documents: 20844665 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [19802432, 20844665) total of 1042233 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.041750 [default0]: using: [default0]: number of documents: 1042233 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 521544 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.011469 [default0]: > building shuffle index with split [0, 521544) and [521544, 521544) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.011372 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_29662ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_29662ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_29662ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.095 seconds [default0]: total number of samples: 521545 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.011291 seconds [default0]: number of documents: 67005817 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [63655526, 67005817) total of 3350291 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.089202 [default0]: using: [default0]: number of documents: 3350291 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 1740320 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.034548 [default0]: > building shuffle index with split [0, 1740320) and [1740320, 1740320) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.036152 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_14273ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_14273ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_14273ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.135 seconds [default0]: total number of samples: 1740321 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003271 seconds [default0]: number of documents: 5149795 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4892305, 5149795) total of 257490 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.012170 [default0]: using: [default0]: number of documents: 257490 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 26369 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003474 [default0]: > building shuffle index with split [0, 26369) and [26369, 26369) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003406 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_209ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_209ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_209ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.060 seconds [default0]: total number of samples: 26370 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.057828 seconds [default0]: number of documents: 58847091 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [55904736, 58847091) total of 2942355 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.120660 [default0]: using: [default0]: number of documents: 2942355 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 1458653 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.044561 [default0]: > building shuffle index with split [0, 1458653) and [1458653, 1458653) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.030069 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_17465ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_17465ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_17465ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.092 seconds [default0]: total number of samples: 1458654 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.018980 seconds [default0]: number of documents: 12514253 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11888540, 12514253) total of 625713 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.028198 [default0]: using: [default0]: number of documents: 625713 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 134070 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.005606 [default0]: > building shuffle index with split [0, 134070) and [134070, 134070) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.005241 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_1461ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_1461ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_1461ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.067 seconds [default0]: total number of samples: 134071 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.068860 seconds [default0]: number of documents: 180608 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [171578, 180608) total of 9030 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.005999 [default0]: using: [default0]: number of documents: 9030 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 2500 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.002876 [default0]: > building shuffle index with split [0, 2500) and [2500, 2500) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.002805 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_15ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_15ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_15ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 2501 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.164307 seconds [default0]: number of documents: 12303134 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11687977, 12303134) total of 615157 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.018185 [default0]: using: [default0]: number of documents: 615157 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 157243 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.006374 [default0]: > building shuffle index with split [0, 157243) and [157243, 157243) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.006032 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_735ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_735ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_735ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 157244 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.103710 seconds [default0]: number of documents: 2033057 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1931404, 2033057) total of 101653 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.010620 [default0]: using: [default0]: number of documents: 101653 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 20516 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.002998 [default0]: > building shuffle index with split [0, 20516) and [20516, 20516) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.002666 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_54ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_54ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_54ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 20517 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.161425 seconds [default0]: number of documents: 26793553 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [25453875, 26793553) total of 1339678 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.040178 [default0]: using: [default0]: number of documents: 1339678 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 101501 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.008466 [default0]: > building shuffle index with split [0, 101501) and [101501, 101501) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.004947 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1000ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1000ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1000ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.005 seconds [default0]: total number of samples: 101502 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.130706 seconds [default0]: number of documents: 3155990 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2998190, 3155990) total of 157800 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.013721 [default0]: using: [default0]: number of documents: 157800 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 44181 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003796 [default0]: > building shuffle index with split [0, 44181) and [44181, 44181) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003505 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_83ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_83ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_83ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.071 seconds [default0]: total number of samples: 44182 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.065134 seconds [default0]: number of documents: 6692522 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [6357896, 6692522) total of 334626 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.012316 [default0]: using: [default0]: number of documents: 334626 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 47612 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.004264 [default0]: > building shuffle index with split [0, 47612) and [47612, 47612) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003734 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 47613 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.101584 seconds [default0]: number of documents: 3017261 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2866398, 3017261) total of 150863 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.010344 [default0]: using: [default0]: number of documents: 150863 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 29297 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003359 [default0]: > building shuffle index with split [0, 29297) and [29297, 29297) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003000 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_68ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_68ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_68ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.064 seconds [default0]: total number of samples: 29298 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.077707 seconds [default0]: number of documents: 3648041 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [3465639, 3648041) total of 182402 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.009980 [default0]: using: [default0]: number of documents: 182402 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 5658 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.004229 [default0]: > building shuffle index with split [0, 5658) and [5658, 5658) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003215 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_90ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_90ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_90ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 5659 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.139771 seconds [default0]: number of documents: 4327282 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4110918, 4327282) total of 216364 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.012561 [default0]: using: [default0]: number of documents: 216364 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 12422 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.002906 [default0]: > building shuffle index with split [0, 12422) and [12422, 12422) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003739 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_49ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_49ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_49ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 12423 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.157192 seconds [default0]: number of documents: 2698896 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2563951, 2698896) total of 134945 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.009589 [default0]: using: [default0]: number of documents: 134945 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 19132 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003359 [default0]: > building shuffle index with split [0, 19132) and [19132, 19132) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.002833 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_69ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_69ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_69ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 19133 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.136582 seconds [default0]: number of documents: 12767593 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [12129213, 12767593) total of 638380 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.018772 [default0]: using: [default0]: number of documents: 638380 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 87927 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.005090 [default0]: > building shuffle index with split [0, 87927) and [87927, 87927) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.004174 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_283ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_283ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_283ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.004 seconds [default0]: total number of samples: 87928 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.102635 seconds [default0]: number of documents: 4342323 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4125207, 4342323) total of 217116 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.011350 [default0]: using: [default0]: number of documents: 217116 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 69779 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003665 [default0]: > building shuffle index with split [0, 69779) and [69779, 69779) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003401 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_123ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_123ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_123ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.002 seconds [default0]: total number of samples: 69780 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.120230 seconds [default0]: number of documents: 3022722 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2871586, 3022722) total of 151136 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.009461 [default0]: using: [default0]: number of documents: 151136 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 22531 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.002474 [default0]: > building shuffle index with split [0, 22531) and [22531, 22531) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.006346 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_167ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_167ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_167ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.002 seconds [default0]: total number of samples: 22532 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.104624 seconds [default0]: number of documents: 1162568 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1104440, 1162568) total of 58128 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.006468 [default0]: using: [default0]: number of documents: 58128 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 1607 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.002848 [default0]: > building shuffle index with split [0, 1607) and [1607, 1607) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.002283 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_43ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_43ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_43ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 1608 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.202223 seconds [default0]: number of documents: 55294645 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [52529913, 55294645) total of 2764732 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.095453 [default0]: using: [default0]: number of documents: 2764732 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 690620 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.019523 [default0]: > building shuffle index with split [0, 690620) and [690620, 690620) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.014755 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_10887ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_10887ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_10887ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.012 seconds [default0]: total number of samples: 690621 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.158098 seconds [default0]: number of documents: 44855616 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [42612835, 44855616) total of 2242781 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.059235 [default0]: using: [default0]: number of documents: 2242781 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 468688 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.015439 [default0]: > building shuffle index with split [0, 468688) and [468688, 468688) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.010504 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_7398ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_7398ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_7398ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.067 seconds [default0]: total number of samples: 468689 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.115082 seconds [default0]: number of documents: 31969891 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [30371396, 31969891) total of 1598495 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.042203 [default0]: using: [default0]: number of documents: 1598495 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 497624 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.012773 [default0]: > building shuffle index with split [0, 497624) and [497624, 497624) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.011838 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_6628ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_6628ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_6628ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.265 seconds [default0]: total number of samples: 497625 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.004071 seconds [default0]: number of documents: 34110375 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [32404856, 34110375) total of 1705519 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.050845 [default0]: using: [default0]: number of documents: 1705519 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 125119 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.010062 [default0]: > building shuffle index with split [0, 125119) and [125119, 125119) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.050143 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_3294ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_3294ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_3294ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.174 seconds [default0]: total number of samples: 125120 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.009254 seconds [default0]: number of documents: 43761623 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [41573542, 43761623) total of 2188081 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.054365 [default0]: using: [default0]: number of documents: 2188081 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 1010591 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.020901 [default0]: > building shuffle index with split [0, 1010591) and [1010591, 1010591) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.019858 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_16178ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_16178ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_16178ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.074 seconds [default0]: total number of samples: 1010592 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.000724 seconds [default0]: number of documents: 197602 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [187722, 197602) total of 9880 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.006237 [default0]: using: [default0]: number of documents: 9880 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 4450 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.002215 [default0]: > building shuffle index with split [0, 4450) and [4450, 4450) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.002120 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_70ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_70ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_70ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.002 seconds [default0]: total number of samples: 4451 [default0]: total number of epochs: 1 [default0]:> building indices for blendable datasets ... [default0]: > sample ratios: [default0]: dataset 0, input: 0.0330676, achieved: 0.0330676 [default0]: dataset 1, input: 0.0112421, achieved: 0.0112421 [default0]: dataset 2, input: 0.130272, achieved: 0.130272 [default0]: dataset 3, input: 0.221712, achieved: 0.221712 [default0]: dataset 4, input: 0.106678, achieved: 0.106678 [default0]: dataset 5, input: 0.00155951, achieved: 0.00155955 [default0]: dataset 6, input: 0.13054, achieved: 0.13054 [default0]: dataset 7, input: 0.010918, achieved: 0.0109181 [default0]: dataset 8, input: 0.000110214, achieved: 0.000110257 [default0]: dataset 9, input: 0.00549238, achieved: 0.00549235 [default0]: dataset 10, input: 0.000402122, achieved: 0.000402094 [default0]: dataset 11, input: 0.00747007, achieved: 0.00747007 [default0]: dataset 12, input: 0.000619047, achieved: 0.000619024 [default0]: dataset 13, input: 0.00103353, achieved: 0.0010336 [default0]: dataset 14, input: 0.000501201, achieved: 0.000501226 [default0]: dataset 15, input: 0.000667277, achieved: 0.000667231 [default0]: dataset 16, input: 0.000359281, achieved: 0.000359326 [default0]: dataset 17, input: 0.000508443, achieved: 0.000508519 [default0]: dataset 18, input: 0.00211373, achieved: 0.0021138 [default0]: dataset 19, input: 0.000912995, achieved: 0.000912961 [default0]: dataset 20, input: 0.00124543, achieved: 0.00124546 [default0]: dataset 21, input: 0.000315887, achieved: 0.00031594 [default0]: dataset 22, input: 0.0813721, achieved: 0.0813721 [default0]: dataset 23, input: 0.0552939, achieved: 0.0552939 [default0]: dataset 24, input: 0.0495415, achieved: 0.0495414 [default0]: dataset 25, input: 0.0246164, achieved: 0.0246163 [default0]: dataset 26, input: 0.120917, achieved: 0.120917 [default0]: dataset 27, input: 0.000517703, achieved: 0.000517666 [default0]:> elapsed time for building blendable dataset indices: 0.32 (sec) [default0]: > building dataset index ... [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: > finished creating indexed dataset in 0.000496 seconds [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: d = build_dataset_group_gpt( [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: d = build_dataset_group_gpt( [default7]: main() [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]: return f(*args, **kwargs) [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset.sizes.shape[0])) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: dataset = _build_single_datasets(paths[0], [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Traceback (most recent call last): [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default6]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Traceback (most recent call last): [default4]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: return f(*args, **kwargs) [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: pretrain( [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]:Traceback (most recent call last): [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: d = build_dataset_group_gpt( [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default2]: d = build_dataset_group_gpt( [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]:Traceback (most recent call last): [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]:Traceback (most recent call last): [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default1]: d = build_dataset_group_gpt( [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Traceback (most recent call last): [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]: pretrain( [default3]: return f(*args, **kwargs) [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: pretrain( [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default0]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: main() [default3]: d = build_dataset_group_gpt( [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Traceback (most recent call last): [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: return f(*args, **kwargs) [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: main() [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]:Traceback (most recent call last): [default2]: pretrain( [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: pretrain( [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: pretrain( [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: return f(*args, **kwargs) [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: main() [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Traceback (most recent call last): [default4]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: pretrain( [default1]: dataset = _build_single_datasets(paths[0], [default4]: return f(*args, **kwargs) [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: pretrain( [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Traceback (most recent call last): [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: pretrain( [default4]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default4]: main() [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: main() [default4]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]:Traceback (most recent call last): [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: pretrain( [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: pretrain( [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: d = build_dataset_group_gpt( [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: d = build_dataset_group_gpt( [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default2]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: dataset = _build_single_datasets(paths[0], [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default4]: return f(*args, **kwargs) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: pretrain( [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: main() [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: pretrain( [default3]: dataset = _build_single_datasets(paths[0], [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]:Traceback (most recent call last): [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Traceback (most recent call last): [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default0]: main() [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: pretrain( [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: dataset = _build_single_datasets(paths[0], [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]:Traceback (most recent call last): [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]: dataset = _build_single_datasets(paths[0], [default1]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]:Traceback (most recent call last): [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: dataset = _build_single_datasets(paths[0], [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: d = build_dataset_group_gpt( [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Traceback (most recent call last): [default6]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3020750) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 513279) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2228405) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1778673) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3954902) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2135275) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2981931) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3593025) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1443346) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3040890) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 370942) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1961246) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1319936) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 928161) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3632156) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1981024) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3784745) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2931742) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 249072) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1552573) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1891446) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2017394) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3635523) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1715071) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3153359) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 408632) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1374287) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1971999) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1800876) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2670594) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3608608) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3914816) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2637938) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 420574) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1580752) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 514158) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main elastic_launch( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return _run_code(code, main_globals, None, elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( raise ChildFailedError( return f(*args, **kwargs) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam41-ib0 rank : 225 (local_rank: 1) exitcode : 1 (pid: 2670595) error_file: /tmp/torchelastic_av9eexgq/none_mm87nrxb/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam33-ib0 rank : 161 (local_rank: 1) exitcode : 1 (pid: 370943) error_file: /tmp/torchelastic_soz_go3o/none_mi938j6p/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam41-ib0 rank : 226 (local_rank: 2) exitcode : 1 (pid: 2670596) error_file: /tmp/torchelastic_av9eexgq/none_mm87nrxb/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], exec(code, run_globals) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam41-ib0 rank : 227 (local_rank: 3) exitcode : 1 (pid: 2670597) error_file: /tmp/torchelastic_av9eexgq/none_mm87nrxb/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam33-ib0 rank : 162 (local_rank: 2) exitcode : 1 (pid: 370944) error_file: /tmp/torchelastic_soz_go3o/none_mi938j6p/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam41-ib0 rank : 228 (local_rank: 4) exitcode : 1 (pid: 2670598) error_file: /tmp/torchelastic_av9eexgq/none_mm87nrxb/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam41-ib0 rank : 229 (local_rank: 5) exitcode : 1 (pid: 2670599) error_file: /tmp/torchelastic_av9eexgq/none_mm87nrxb/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam33-ib0 rank : 163 (local_rank: 3) exitcode : 1 (pid: 370945) error_file: /tmp/torchelastic_soz_go3o/none_mi938j6p/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam41-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam33-ib0 rank : 164 (local_rank: 4) exitcode : 1 (pid: 370946) error_file: /tmp/torchelastic_soz_go3o/none_mi938j6p/attempt_0/4/error.json traceback : Traceback (most recent call last): rank : 230 (local_rank: 6) exitcode : 1 (pid: 2670600) error_file: /tmp/torchelastic_av9eexgq/none_mm87nrxb/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) run(args) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam41-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam33-ib0 rank : 165 (local_rank: 5) exitcode : 1 (pid: 370947) error_file: /tmp/torchelastic_soz_go3o/none_mi938j6p/attempt_0/5/error.json traceback : Traceback (most recent call last): rank : 231 (local_rank: 7) exitcode : 1 (pid: 2670601) error_file: /tmp/torchelastic_av9eexgq/none_mm87nrxb/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam33-ib0 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam41-ib0 rank : 224 (local_rank: 0) exitcode : 1 (pid: 2670594) error_file: /tmp/torchelastic_av9eexgq/none_mm87nrxb/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( rank : 166 (local_rank: 6) exitcode : 1 (pid: 370948) error_file: /tmp/torchelastic_soz_go3o/none_mi938j6p/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam33-ib0 rank : 167 (local_rank: 7) exitcode : 1 (pid: 370949) error_file: /tmp/torchelastic_soz_go3o/none_mi938j6p/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators return launch_agent(self._config, self._entrypoint, list(args)) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam33-ib0 rank : 160 (local_rank: 0) exitcode : 1 (pid: 370942) error_file: /tmp/torchelastic_soz_go3o/none_mi938j6p/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper raise ChildFailedError( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam44-ib0 rank : 249 (local_rank: 1) exitcode : 1 (pid: 1580753) error_file: /tmp/torchelastic_z39g79qz/none_t5ybexzc/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return f(*args, **kwargs) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam44-ib0 rank : 250 (local_rank: 2) exitcode : 1 (pid: 1580754) error_file: /tmp/torchelastic_z39g79qz/none_t5ybexzc/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam44-ib0 rank : 251 (local_rank: 3) exitcode : 1 (pid: 1580755) error_file: /tmp/torchelastic_z39g79qz/none_t5ybexzc/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam44-ib0 rank : 252 (local_rank: 4) exitcode : 1 (pid: 1580756) error_file: /tmp/torchelastic_z39g79qz/none_t5ybexzc/attempt_0/4/error.json traceback : Traceback (most recent call last): raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( return _run_code(code, main_globals, None, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam44-ib0 rank : 253 (local_rank: 5) exitcode : 1 (pid: 1580757) error_file: /tmp/torchelastic_z39g79qz/none_t5ybexzc/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam44-ib0 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam34-ib0 rank : 169 (local_rank: 1) exitcode : 1 (pid: 1715072) error_file: /tmp/torchelastic_0yj1smbs/none_12j8_dhd/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( rank : 254 (local_rank: 6) exitcode : 1 (pid: 1580758) error_file: /tmp/torchelastic_z39g79qz/none_t5ybexzc/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam44-ib0 torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam46-ib0 rank : 265 (local_rank: 1) exitcode : 1 (pid: 3914817) error_file: /tmp/torchelastic_25cgeui3/none_yo2ap4ww/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( rank : 255 (local_rank: 7) exitcode : 1 (pid: 1580759) error_file: /tmp/torchelastic_z39g79qz/none_t5ybexzc/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam44-ib0 rank : 248 (local_rank: 0) exitcode : 1 (pid: 1580752) error_file: /tmp/torchelastic_z39g79qz/none_t5ybexzc/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam34-ib0 rank : 170 (local_rank: 2) exitcode : 1 (pid: 1715073) error_file: /tmp/torchelastic_0yj1smbs/none_12j8_dhd/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, exec(code, run_globals) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam46-ib0 rank : 266 (local_rank: 2) exitcode : 1 (pid: 3914818) error_file: /tmp/torchelastic_25cgeui3/none_yo2ap4ww/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam46-ib0 rank : 267 (local_rank: 3) exitcode : 1 (pid: 3914819) error_file: /tmp/torchelastic_25cgeui3/none_yo2ap4ww/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam34-ib0 rank : 171 (local_rank: 3) exitcode : 1 (pid: 1715074) error_file: /tmp/torchelastic_0yj1smbs/none_12j8_dhd/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam46-ib0 rank : 268 (local_rank: 4) exitcode : 1 (pid: 3914820) error_file: /tmp/torchelastic_25cgeui3/none_yo2ap4ww/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam34-ib0 rank : 172 (local_rank: 4) exitcode : 1 (pid: 1715075) error_file: /tmp/torchelastic_0yj1smbs/none_12j8_dhd/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam46-ib0 rank : 269 (local_rank: 5) exitcode : 1 (pid: 3914821) error_file: /tmp/torchelastic_25cgeui3/none_yo2ap4ww/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam34-ib0 rank : 173 (local_rank: 5) exitcode : 1 (pid: 1715076) error_file: /tmp/torchelastic_0yj1smbs/none_12j8_dhd/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam46-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam34-ib0 rank : 270 (local_rank: 6) exitcode : 1 (pid: 3914822) error_file: /tmp/torchelastic_25cgeui3/none_yo2ap4ww/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 174 (local_rank: 6) exitcode : 1 (pid: 1715077) error_file: /tmp/torchelastic_0yj1smbs/none_12j8_dhd/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam46-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam34-ib0 rank : 271 (local_rank: 7) exitcode : 1 (pid: 3914823) error_file: /tmp/torchelastic_25cgeui3/none_yo2ap4ww/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 175 (local_rank: 7) exitcode : 1 (pid: 1715078) error_file: /tmp/torchelastic_0yj1smbs/none_12j8_dhd/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' raise ChildFailedError( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam46-ib0 rank : 264 (local_rank: 0) exitcode : 1 (pid: 3914816) error_file: /tmp/torchelastic_25cgeui3/none_yo2ap4ww/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam32-ib0 rank : 153 (local_rank: 1) exitcode : 1 (pid: 513280) error_file: /tmp/torchelastic_ofoeajhr/none_8nhu0xic/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam32-ib0 rank : 154 (local_rank: 2) exitcode : 1 (pid: 513281) error_file: /tmp/torchelastic_ofoeajhr/none_8nhu0xic/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return launch_agent(self._config, self._entrypoint, list(args)) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam32-ib0 rank : 155 (local_rank: 3) exitcode : 1 (pid: 513282) error_file: /tmp/torchelastic_ofoeajhr/none_8nhu0xic/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam32-ib0 rank : 156 (local_rank: 4) exitcode : 1 (pid: 513283) error_file: /tmp/torchelastic_ofoeajhr/none_8nhu0xic/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam32-ib0 rank : 157 (local_rank: 5) exitcode : 1 (pid: 513284) error_file: /tmp/torchelastic_ofoeajhr/none_8nhu0xic/attempt_0/5/error.json traceback : Traceback (most recent call last): raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam32-ib0 rank : 158 (local_rank: 6) exitcode : 1 (pid: 513285) error_file: /tmp/torchelastic_ofoeajhr/none_8nhu0xic/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam32-ib0 rank : 159 (local_rank: 7) exitcode : 1 (pid: 513286) error_file: /tmp/torchelastic_ofoeajhr/none_8nhu0xic/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam11-ib0 rank : 65 (local_rank: 1) exitcode : 1 (pid: 1972000) error_file: /tmp/torchelastic_r0eiahzw/none__d3zjr36/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam32-ib0 rank : 152 (local_rank: 0) exitcode : 1 (pid: 513279) error_file: /tmp/torchelastic_ofoeajhr/none_8nhu0xic/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam11-ib0 rank : 66 (local_rank: 2) exitcode : 1 (pid: 1972001) error_file: /tmp/torchelastic_r0eiahzw/none__d3zjr36/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam11-ib0 rank : 67 (local_rank: 3) exitcode : 1 (pid: 1972002) error_file: /tmp/torchelastic_r0eiahzw/none__d3zjr36/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam11-ib0 rank : 68 (local_rank: 4) exitcode : 1 (pid: 1972003) error_file: /tmp/torchelastic_r0eiahzw/none__d3zjr36/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam11-ib0 rank : 69 (local_rank: 5) exitcode : 1 (pid: 1972004) error_file: /tmp/torchelastic_r0eiahzw/none__d3zjr36/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam11-ib0 rank : 70 (local_rank: 6) exitcode : 1 (pid: 1972005) error_file: /tmp/torchelastic_r0eiahzw/none__d3zjr36/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam11-ib0 rank : 71 (local_rank: 7) exitcode : 1 (pid: 1972006) error_file: /tmp/torchelastic_r0eiahzw/none__d3zjr36/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam11-ib0 rank : 64 (local_rank: 0) exitcode : 1 (pid: 1971999) error_file: /tmp/torchelastic_r0eiahzw/none__d3zjr36/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main raise ChildFailedError( raise ChildFailedError( return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam52-ib0 rank : 281 (local_rank: 1) exitcode : 1 (pid: 1778674) error_file: /tmp/torchelastic_endyakcn/none_q_podn4f/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam31-ib0 rank : 145 (local_rank: 1) exitcode : 1 (pid: 514159) error_file: /tmp/torchelastic_kzf5ozwy/none_7e75mpsf/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam52-ib0 rank : 282 (local_rank: 2) exitcode : 1 (pid: 1778675) error_file: /tmp/torchelastic_endyakcn/none_q_podn4f/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam31-ib0 rank : 146 (local_rank: 2) exitcode : 1 (pid: 514160) error_file: /tmp/torchelastic_kzf5ozwy/none_7e75mpsf/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam52-ib0 rank : 283 (local_rank: 3) exitcode : 1 (pid: 1778676) error_file: /tmp/torchelastic_endyakcn/none_q_podn4f/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam31-ib0 rank : 147 (local_rank: 3) exitcode : 1 (pid: 514161) error_file: /tmp/torchelastic_kzf5ozwy/none_7e75mpsf/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam52-ib0 rank : 284 (local_rank: 4) exitcode : 1 (pid: 1778677) error_file: /tmp/torchelastic_endyakcn/none_q_podn4f/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam31-ib0 rank : 148 (local_rank: 4) exitcode : 1 (pid: 514162) error_file: /tmp/torchelastic_kzf5ozwy/none_7e75mpsf/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam52-ib0 rank : 285 (local_rank: 5) exitcode : 1 (pid: 1778678) error_file: /tmp/torchelastic_endyakcn/none_q_podn4f/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam31-ib0 rank : 149 (local_rank: 5) exitcode : 1 (pid: 514163) error_file: /tmp/torchelastic_kzf5ozwy/none_7e75mpsf/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam52-ib0 return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam31-ib0 return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, rank : 286 (local_rank: 6) exitcode : 1 (pid: 1778679) error_file: /tmp/torchelastic_endyakcn/none_q_podn4f/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, rank : 150 (local_rank: 6) exitcode : 1 (pid: 514164) error_file: /tmp/torchelastic_kzf5ozwy/none_7e75mpsf/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam52-ib0 return _run_code(code, main_globals, None, main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam31-ib0 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in rank : 287 (local_rank: 7) exitcode : 1 (pid: 1778680) error_file: /tmp/torchelastic_endyakcn/none_q_podn4f/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main rank : 151 (local_rank: 7) exitcode : 1 (pid: 514165) error_file: /tmp/torchelastic_kzf5ozwy/none_7e75mpsf/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' elastic_launch( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam52-ib0 rank : 280 (local_rank: 0) exitcode : 1 (pid: 1778673) error_file: /tmp/torchelastic_endyakcn/none_q_podn4f/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam31-ib0 rank : 144 (local_rank: 0) exitcode : 1 (pid: 514158) error_file: /tmp/torchelastic_kzf5ozwy/none_7e75mpsf/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper raise ChildFailedError( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam35-ib0 rank : 177 (local_rank: 1) exitcode : 1 (pid: 1552574) error_file: /tmp/torchelastic_peb_4hhf/none_d10j6h_i/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam35-ib0 rank : 178 (local_rank: 2) exitcode : 1 (pid: 1552575) error_file: /tmp/torchelastic_peb_4hhf/none_d10j6h_i/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam35-ib0 rank : 179 (local_rank: 3) exitcode : 1 (pid: 1552576) error_file: /tmp/torchelastic_peb_4hhf/none_d10j6h_i/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam35-ib0 rank : 180 (local_rank: 4) exitcode : 1 (pid: 1552577) error_file: /tmp/torchelastic_peb_4hhf/none_d10j6h_i/attempt_0/4/error.json traceback : Traceback (most recent call last): run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam35-ib0 rank : 181 (local_rank: 5) exitcode : 1 (pid: 1552578) error_file: /tmp/torchelastic_peb_4hhf/none_d10j6h_i/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam35-ib0 return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent rank : 182 (local_rank: 6) exitcode : 1 (pid: 1552579) error_file: /tmp/torchelastic_peb_4hhf/none_d10j6h_i/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam35-ib0 run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run rank : 183 (local_rank: 7) exitcode : 1 (pid: 1552580) error_file: /tmp/torchelastic_peb_4hhf/none_d10j6h_i/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) run(args) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main raise ChildFailedError( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam35-ib0 rank : 176 (local_rank: 0) exitcode : 1 (pid: 1552573) error_file: /tmp/torchelastic_peb_4hhf/none_d10j6h_i/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, return launch_agent(self._config, self._entrypoint, list(args)) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam13-ib0 rank : 73 (local_rank: 1) exitcode : 1 (pid: 1961247) error_file: /tmp/torchelastic_u4_ccsv5/none_apupzypy/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return f(*args, **kwargs) raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam13-ib0 rank : 74 (local_rank: 2) exitcode : 1 (pid: 1961248) error_file: /tmp/torchelastic_u4_ccsv5/none_apupzypy/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam42-ib0 rank : 233 (local_rank: 1) exitcode : 1 (pid: 3040891) error_file: /tmp/torchelastic_90zkr8k_/none_ryfwz67b/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( raise ChildFailedError( return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return _run_code(code, main_globals, None, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam05-ib0 rank : 25 (local_rank: 1) exitcode : 1 (pid: 3020751) error_file: /tmp/torchelastic_bxlmuqup/none_73bry_i8/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam30-ib0 rank : 137 (local_rank: 1) exitcode : 1 (pid: 3593026) error_file: /tmp/torchelastic_ewcug8cy/none_r788ghox/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return _run_code(code, main_globals, None, elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam09-ib0 rank : 57 (local_rank: 1) exitcode : 1 (pid: 2017395) error_file: /tmp/torchelastic_5zqpufgn/none_oc6an53_/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam42-ib0 rank : 234 (local_rank: 2) exitcode : 1 (pid: 3040892) error_file: /tmp/torchelastic_90zkr8k_/none_ryfwz67b/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return launch_agent(self._config, self._entrypoint, list(args)) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam45-ib0 rank : 257 (local_rank: 1) exitcode : 1 (pid: 408633) error_file: /tmp/torchelastic_m9x_grra/none_pne3h_69/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam39-ib0 rank : 209 (local_rank: 1) exitcode : 1 (pid: 1374288) error_file: /tmp/torchelastic_0xmze9a2/none_p82ylmvb/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam13-ib0 rank : 75 (local_rank: 3) exitcode : 1 (pid: 1961249) error_file: /tmp/torchelastic_u4_ccsv5/none_apupzypy/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent elastic_launch( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent run(args) exec(code, run_globals) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in raise ChildFailedError( raise ChildFailedError( exec(code, run_globals) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam15-ib0 rank : 89 (local_rank: 1) exitcode : 1 (pid: 2135276) error_file: /tmp/torchelastic_8x3dfen6/none_khxdiy95/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam05-ib0 rank : 26 (local_rank: 2) exitcode : 1 (pid: 3020752) error_file: /tmp/torchelastic_bxlmuqup/none_73bry_i8/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) raise ChildFailedError( main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam30-ib0 rank : 138 (local_rank: 2) exitcode : 1 (pid: 3593027) error_file: /tmp/torchelastic_ewcug8cy/none_r788ghox/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in raise ChildFailedError( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam09-ib0 rank : 58 (local_rank: 2) exitcode : 1 (pid: 2017396) error_file: /tmp/torchelastic_5zqpufgn/none_oc6an53_/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam43-ib0 rank : 241 (local_rank: 1) exitcode : 1 (pid: 2981932) error_file: /tmp/torchelastic_vnpgrr_a/none_us6w0t7i/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam40-ib0 rank : 217 (local_rank: 1) exitcode : 1 (pid: 1319937) error_file: /tmp/torchelastic_2q3yk3no/none_rjd9r66h/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam07-ib0 rank : 41 (local_rank: 1) exitcode : 1 (pid: 3954903) error_file: /tmp/torchelastic_7dkoz3aw/none_x7p7thmg/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam45-ib0 rank : 258 (local_rank: 2) exitcode : 1 (pid: 408634) error_file: /tmp/torchelastic_m9x_grra/none_pne3h_69/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam18-ib0 rank : 97 (local_rank: 1) exitcode : 1 (pid: 2637939) error_file: /tmp/torchelastic__3ae86eo/none_c2w8e34p/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam39-ib0 rank : 210 (local_rank: 2) exitcode : 1 (pid: 1374289) error_file: /tmp/torchelastic_0xmze9a2/none_p82ylmvb/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam04-ib0 rank : 17 (local_rank: 1) exitcode : 1 (pid: 1981025) error_file: /tmp/torchelastic_r11ollqe/none__2es3gou/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam13-ib0 rank : 76 (local_rank: 4) exitcode : 1 (pid: 1961250) error_file: /tmp/torchelastic_u4_ccsv5/none_apupzypy/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam38-ib0 rank : 201 (local_rank: 1) exitcode : 1 (pid: 3784746) error_file: /tmp/torchelastic_ab06ab76/none_aepny2a6/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam14-ib0 rank : 81 (local_rank: 1) exitcode : 1 (pid: 2228406) error_file: /tmp/torchelastic_vac09t1x/none_ap52qdgb/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam42-ib0 rank : 235 (local_rank: 3) exitcode : 1 (pid: 3040893) error_file: /tmp/torchelastic_90zkr8k_/none_ryfwz67b/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam08-ib0 rank : 49 (local_rank: 1) exitcode : 1 (pid: 2931743) error_file: /tmp/torchelastic_lnjnjfmh/none_kdpc5rz4/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam36-ib0 rank : 185 (local_rank: 1) exitcode : 1 (pid: 1800877) error_file: /tmp/torchelastic_snuy6_qo/none_s1cp4mn_/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam26-ib0 rank : 113 (local_rank: 1) exitcode : 1 (pid: 420575) error_file: /tmp/torchelastic_j_jqwl_n/none_8aq7hvb1/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam15-ib0 rank : 90 (local_rank: 2) exitcode : 1 (pid: 2135277) error_file: /tmp/torchelastic_8x3dfen6/none_khxdiy95/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam43-ib0 rank : 242 (local_rank: 2) exitcode : 1 (pid: 2981933) error_file: /tmp/torchelastic_vnpgrr_a/none_us6w0t7i/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam40-ib0 rank : 218 (local_rank: 2) exitcode : 1 (pid: 1319938) error_file: /tmp/torchelastic_2q3yk3no/none_rjd9r66h/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam07-ib0 rank : 42 (local_rank: 2) exitcode : 1 (pid: 3954904) error_file: /tmp/torchelastic_7dkoz3aw/none_x7p7thmg/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam05-ib0 rank : 27 (local_rank: 3) exitcode : 1 (pid: 3020753) error_file: /tmp/torchelastic_bxlmuqup/none_73bry_i8/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam18-ib0 rank : 98 (local_rank: 2) exitcode : 1 (pid: 2637940) error_file: /tmp/torchelastic__3ae86eo/none_c2w8e34p/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam30-ib0 rank : 139 (local_rank: 3) exitcode : 1 (pid: 3593028) error_file: /tmp/torchelastic_ewcug8cy/none_r788ghox/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam04-ib0 rank : 18 (local_rank: 2) exitcode : 1 (pid: 1981026) error_file: /tmp/torchelastic_r11ollqe/none__2es3gou/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam09-ib0 rank : 59 (local_rank: 3) exitcode : 1 (pid: 2017397) error_file: /tmp/torchelastic_5zqpufgn/none_oc6an53_/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam38-ib0 rank : 202 (local_rank: 2) exitcode : 1 (pid: 3784747) error_file: /tmp/torchelastic_ab06ab76/none_aepny2a6/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam14-ib0 rank : 82 (local_rank: 2) exitcode : 1 (pid: 2228407) error_file: /tmp/torchelastic_vac09t1x/none_ap52qdgb/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return launch_agent(self._config, self._entrypoint, list(args)) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam42-ib0 rank : 236 (local_rank: 4) exitcode : 1 (pid: 3040894) error_file: /tmp/torchelastic_90zkr8k_/none_ryfwz67b/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam08-ib0 rank : 50 (local_rank: 2) exitcode : 1 (pid: 2931744) error_file: /tmp/torchelastic_lnjnjfmh/none_kdpc5rz4/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam36-ib0 rank : 186 (local_rank: 2) exitcode : 1 (pid: 1800878) error_file: /tmp/torchelastic_snuy6_qo/none_s1cp4mn_/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam26-ib0 rank : 114 (local_rank: 2) exitcode : 1 (pid: 420576) error_file: /tmp/torchelastic_j_jqwl_n/none_8aq7hvb1/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam45-ib0 rank : 259 (local_rank: 3) exitcode : 1 (pid: 408635) error_file: /tmp/torchelastic_m9x_grra/none_pne3h_69/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam39-ib0 rank : 211 (local_rank: 3) exitcode : 1 (pid: 1374290) error_file: /tmp/torchelastic_0xmze9a2/none_p82ylmvb/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam13-ib0 rank : 77 (local_rank: 5) exitcode : 1 (pid: 1961251) error_file: /tmp/torchelastic_u4_ccsv5/none_apupzypy/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam28-ib0 rank : 129 (local_rank: 1) exitcode : 1 (pid: 3608609) error_file: /tmp/torchelastic_hb70z5ko/none_qfbr0tqr/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam15-ib0 rank : 91 (local_rank: 3) exitcode : 1 (pid: 2135278) error_file: /tmp/torchelastic_8x3dfen6/none_khxdiy95/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam05-ib0 rank : 28 (local_rank: 4) exitcode : 1 (pid: 3020754) error_file: /tmp/torchelastic_bxlmuqup/none_73bry_i8/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam30-ib0 rank : 140 (local_rank: 4) exitcode : 1 (pid: 3593029) error_file: /tmp/torchelastic_ewcug8cy/none_r788ghox/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam09-ib0 rank : 60 (local_rank: 4) exitcode : 1 (pid: 2017398) error_file: /tmp/torchelastic_5zqpufgn/none_oc6an53_/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam43-ib0 rank : 243 (local_rank: 3) exitcode : 1 (pid: 2981934) error_file: /tmp/torchelastic_vnpgrr_a/none_us6w0t7i/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam40-ib0 rank : 219 (local_rank: 3) exitcode : 1 (pid: 1319939) error_file: /tmp/torchelastic_2q3yk3no/none_rjd9r66h/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam07-ib0 rank : 43 (local_rank: 3) exitcode : 1 (pid: 3954905) error_file: /tmp/torchelastic_7dkoz3aw/none_x7p7thmg/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam45-ib0 rank : 260 (local_rank: 4) exitcode : 1 (pid: 408636) error_file: /tmp/torchelastic_m9x_grra/none_pne3h_69/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam18-ib0 rank : 99 (local_rank: 3) exitcode : 1 (pid: 2637941) error_file: /tmp/torchelastic__3ae86eo/none_c2w8e34p/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam39-ib0 rank : 212 (local_rank: 4) exitcode : 1 (pid: 1374291) error_file: /tmp/torchelastic_0xmze9a2/none_p82ylmvb/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam04-ib0 rank : 19 (local_rank: 3) exitcode : 1 (pid: 1981027) error_file: /tmp/torchelastic_r11ollqe/none__2es3gou/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam13-ib0 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam38-ib0 rank : 203 (local_rank: 3) exitcode : 1 (pid: 3784748) error_file: /tmp/torchelastic_ab06ab76/none_aepny2a6/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam28-ib0 rank : 130 (local_rank: 2) exitcode : 1 (pid: 3608610) error_file: /tmp/torchelastic_hb70z5ko/none_qfbr0tqr/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam14-ib0 rank : 83 (local_rank: 3) exitcode : 1 (pid: 2228408) error_file: /tmp/torchelastic_vac09t1x/none_ap52qdgb/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam42-ib0 rank : 237 (local_rank: 5) exitcode : 1 (pid: 3040895) error_file: /tmp/torchelastic_90zkr8k_/none_ryfwz67b/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam08-ib0 rank : 51 (local_rank: 3) exitcode : 1 (pid: 2931745) error_file: /tmp/torchelastic_lnjnjfmh/none_kdpc5rz4/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam36-ib0 rank : 187 (local_rank: 3) exitcode : 1 (pid: 1800879) error_file: /tmp/torchelastic_snuy6_qo/none_s1cp4mn_/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam26-ib0 rank : 115 (local_rank: 3) exitcode : 1 (pid: 420577) error_file: /tmp/torchelastic_j_jqwl_n/none_8aq7hvb1/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam15-ib0 rank : 92 (local_rank: 4) exitcode : 1 (pid: 2135279) error_file: /tmp/torchelastic_8x3dfen6/none_khxdiy95/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( rank : 78 (local_rank: 6) exitcode : 1 (pid: 1961252) error_file: /tmp/torchelastic_u4_ccsv5/none_apupzypy/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam43-ib0 rank : 244 (local_rank: 4) exitcode : 1 (pid: 2981935) error_file: /tmp/torchelastic_vnpgrr_a/none_us6w0t7i/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam40-ib0 rank : 220 (local_rank: 4) exitcode : 1 (pid: 1319940) error_file: /tmp/torchelastic_2q3yk3no/none_rjd9r66h/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam07-ib0 rank : 44 (local_rank: 4) exitcode : 1 (pid: 3954906) error_file: /tmp/torchelastic_7dkoz3aw/none_x7p7thmg/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam05-ib0 rank : 29 (local_rank: 5) exitcode : 1 (pid: 3020755) error_file: /tmp/torchelastic_bxlmuqup/none_73bry_i8/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam18-ib0 rank : 100 (local_rank: 4) exitcode : 1 (pid: 2637942) error_file: /tmp/torchelastic__3ae86eo/none_c2w8e34p/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam30-ib0 rank : 141 (local_rank: 5) exitcode : 1 (pid: 3593030) error_file: /tmp/torchelastic_ewcug8cy/none_r788ghox/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam04-ib0 rank : 20 (local_rank: 4) exitcode : 1 (pid: 1981028) error_file: /tmp/torchelastic_r11ollqe/none__2es3gou/attempt_0/4/error.json traceback : Traceback (most recent call last): train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam09-ib0 rank : 61 (local_rank: 5) exitcode : 1 (pid: 2017399) error_file: /tmp/torchelastic_5zqpufgn/none_oc6an53_/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam38-ib0 rank : 204 (local_rank: 4) exitcode : 1 (pid: 3784749) error_file: /tmp/torchelastic_ab06ab76/none_aepny2a6/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam14-ib0 rank : 84 (local_rank: 4) exitcode : 1 (pid: 2228409) error_file: /tmp/torchelastic_vac09t1x/none_ap52qdgb/attempt_0/4/error.json traceback : Traceback (most recent call last): torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam03-ib0 rank : 9 (local_rank: 1) exitcode : 1 (pid: 1891447) error_file: /tmp/torchelastic_5yeok41_/none_sjgk_997/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) run(args) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam42-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam08-ib0 rank : 52 (local_rank: 4) exitcode : 1 (pid: 2931746) error_file: /tmp/torchelastic_lnjnjfmh/none_kdpc5rz4/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam36-ib0 rank : 188 (local_rank: 4) exitcode : 1 (pid: 1800880) error_file: /tmp/torchelastic_snuy6_qo/none_s1cp4mn_/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam26-ib0 rank : 116 (local_rank: 4) exitcode : 1 (pid: 420578) error_file: /tmp/torchelastic_j_jqwl_n/none_8aq7hvb1/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam45-ib0 rank : 261 (local_rank: 5) exitcode : 1 (pid: 408637) error_file: /tmp/torchelastic_m9x_grra/none_pne3h_69/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam39-ib0 rank : 213 (local_rank: 5) exitcode : 1 (pid: 1374292) error_file: /tmp/torchelastic_0xmze9a2/none_p82ylmvb/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam13-ib0 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam28-ib0 rank : 131 (local_rank: 3) exitcode : 1 (pid: 3608611) error_file: /tmp/torchelastic_hb70z5ko/none_qfbr0tqr/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run rank : 238 (local_rank: 6) exitcode : 1 (pid: 3040896) error_file: /tmp/torchelastic_90zkr8k_/none_ryfwz67b/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam15-ib0 rank : 93 (local_rank: 5) exitcode : 1 (pid: 2135280) error_file: /tmp/torchelastic_8x3dfen6/none_khxdiy95/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam05-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam30-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( rank : 79 (local_rank: 7) exitcode : 1 (pid: 1961253) error_file: /tmp/torchelastic_u4_ccsv5/none_apupzypy/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam09-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam03-ib0 rank : 10 (local_rank: 2) exitcode : 1 (pid: 1891448) error_file: /tmp/torchelastic_5yeok41_/none_sjgk_997/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam43-ib0 rank : 245 (local_rank: 5) exitcode : 1 (pid: 2981936) error_file: /tmp/torchelastic_vnpgrr_a/none_us6w0t7i/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam40-ib0 rank : 221 (local_rank: 5) exitcode : 1 (pid: 1319941) error_file: /tmp/torchelastic_2q3yk3no/none_rjd9r66h/attempt_0/5/error.json traceback : Traceback (most recent call last): train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam07-ib0 rank : 45 (local_rank: 5) exitcode : 1 (pid: 3954907) error_file: /tmp/torchelastic_7dkoz3aw/none_x7p7thmg/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam45-ib0 rank : 30 (local_rank: 6) exitcode : 1 (pid: 3020756) error_file: /tmp/torchelastic_bxlmuqup/none_73bry_i8/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam18-ib0 rank : 101 (local_rank: 5) exitcode : 1 (pid: 2637943) error_file: /tmp/torchelastic__3ae86eo/none_c2w8e34p/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam39-ib0 rank : 142 (local_rank: 6) exitcode : 1 (pid: 3593031) error_file: /tmp/torchelastic_ewcug8cy/none_r788ghox/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam04-ib0 rank : 21 (local_rank: 5) exitcode : 1 (pid: 1981029) error_file: /tmp/torchelastic_r11ollqe/none__2es3gou/attempt_0/5/error.json traceback : Traceback (most recent call last): train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam38-ib0 rank : 205 (local_rank: 5) exitcode : 1 (pid: 3784750) error_file: /tmp/torchelastic_ab06ab76/none_aepny2a6/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam28-ib0 rank : 132 (local_rank: 4) exitcode : 1 (pid: 3608612) error_file: /tmp/torchelastic_hb70z5ko/none_qfbr0tqr/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam14-ib0 rank : 85 (local_rank: 5) exitcode : 1 (pid: 2228410) error_file: /tmp/torchelastic_vac09t1x/none_ap52qdgb/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam42-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam08-ib0 rank : 53 (local_rank: 5) exitcode : 1 (pid: 2931747) error_file: /tmp/torchelastic_lnjnjfmh/none_kdpc5rz4/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam36-ib0 rank : 189 (local_rank: 5) exitcode : 1 (pid: 1800881) error_file: /tmp/torchelastic_snuy6_qo/none_s1cp4mn_/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam26-ib0 rank : 117 (local_rank: 5) exitcode : 1 (pid: 420579) error_file: /tmp/torchelastic_j_jqwl_n/none_8aq7hvb1/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam15-ib0 rank : 262 (local_rank: 6) exitcode : 1 (pid: 408638) error_file: /tmp/torchelastic_m9x_grra/none_pne3h_69/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) rank : 214 (local_rank: 6) exitcode : 1 (pid: 1374293) error_file: /tmp/torchelastic_0xmze9a2/none_p82ylmvb/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam13-ib0 rank : 72 (local_rank: 0) exitcode : 1 (pid: 1961246) error_file: /tmp/torchelastic_u4_ccsv5/none_apupzypy/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam43-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam40-ib0 rank : 239 (local_rank: 7) exitcode : 1 (pid: 3040897) error_file: /tmp/torchelastic_90zkr8k_/none_ryfwz67b/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam07-ib0 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) rank : 94 (local_rank: 6) exitcode : 1 (pid: 2135281) error_file: /tmp/torchelastic_8x3dfen6/none_khxdiy95/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam05-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam18-ib0 train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam30-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam04-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam38-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam14-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam03-ib0 rank : 11 (local_rank: 3) exitcode : 1 (pid: 1891449) error_file: /tmp/torchelastic_5yeok41_/none_sjgk_997/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ rank : 246 (local_rank: 6) exitcode : 1 (pid: 2981937) error_file: /tmp/torchelastic_vnpgrr_a/none_us6w0t7i/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 222 (local_rank: 6) exitcode : 1 (pid: 1319942) error_file: /tmp/torchelastic_2q3yk3no/none_rjd9r66h/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam08-ib0 rank : 46 (local_rank: 6) exitcode : 1 (pid: 3954908) error_file: /tmp/torchelastic_7dkoz3aw/none_x7p7thmg/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam36-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam26-ib0 train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam45-ib0 rank : 31 (local_rank: 7) exitcode : 1 (pid: 3020757) error_file: /tmp/torchelastic_bxlmuqup/none_73bry_i8/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 102 (local_rank: 6) exitcode : 1 (pid: 2637944) error_file: /tmp/torchelastic__3ae86eo/none_c2w8e34p/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam39-ib0 rank : 143 (local_rank: 7) exitcode : 1 (pid: 3593032) error_file: /tmp/torchelastic_ewcug8cy/none_r788ghox/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 22 (local_rank: 6) exitcode : 1 (pid: 1981030) error_file: /tmp/torchelastic_r11ollqe/none__2es3gou/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ rank : 206 (local_rank: 6) exitcode : 1 (pid: 3784751) error_file: /tmp/torchelastic_ab06ab76/none_aepny2a6/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam28-ib0 rank : 133 (local_rank: 5) exitcode : 1 (pid: 3608613) error_file: /tmp/torchelastic_hb70z5ko/none_qfbr0tqr/attempt_0/5/error.json traceback : Traceback (most recent call last): rank : 86 (local_rank: 6) exitcode : 1 (pid: 2228411) error_file: /tmp/torchelastic_vac09t1x/none_ap52qdgb/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam42-ib0 rank : 232 (local_rank: 0) exitcode : 1 (pid: 3040890) error_file: /tmp/torchelastic_90zkr8k_/none_ryfwz67b/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( rank : 54 (local_rank: 6) exitcode : 1 (pid: 2931748) error_file: /tmp/torchelastic_lnjnjfmh/none_kdpc5rz4/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) rank : 190 (local_rank: 6) exitcode : 1 (pid: 1800882) error_file: /tmp/torchelastic_snuy6_qo/none_s1cp4mn_/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 118 (local_rank: 6) exitcode : 1 (pid: 420580) error_file: /tmp/torchelastic_j_jqwl_n/none_8aq7hvb1/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam15-ib0 rank : 263 (local_rank: 7) exitcode : 1 (pid: 408639) error_file: /tmp/torchelastic_m9x_grra/none_pne3h_69/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) rank : 215 (local_rank: 7) exitcode : 1 (pid: 1374296) error_file: /tmp/torchelastic_0xmze9a2/none_p82ylmvb/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam03-ib0 rank : 12 (local_rank: 4) exitcode : 1 (pid: 1891450) error_file: /tmp/torchelastic_5yeok41_/none_sjgk_997/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam43-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam40-ib0 elastic_launch( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam07-ib0 train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) rank : 95 (local_rank: 7) exitcode : 1 (pid: 2135282) error_file: /tmp/torchelastic_8x3dfen6/none_khxdiy95/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam05-ib0 rank : 24 (local_rank: 0) exitcode : 1 (pid: 3020750) error_file: /tmp/torchelastic_bxlmuqup/none_73bry_i8/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam18-ib0 train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam30-ib0 rank : 136 (local_rank: 0) exitcode : 1 (pid: 3593025) error_file: /tmp/torchelastic_ewcug8cy/none_r788ghox/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam04-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam38-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam28-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam14-ib0 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent rank : 247 (local_rank: 7) exitcode : 1 (pid: 2981938) error_file: /tmp/torchelastic_vnpgrr_a/none_us6w0t7i/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 223 (local_rank: 7) exitcode : 1 (pid: 1319943) error_file: /tmp/torchelastic_2q3yk3no/none_rjd9r66h/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam08-ib0 rank : 47 (local_rank: 7) exitcode : 1 (pid: 3954909) error_file: /tmp/torchelastic_7dkoz3aw/none_x7p7thmg/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam36-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam26-ib0 train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam45-ib0 rank : 256 (local_rank: 0) exitcode : 1 (pid: 408632) error_file: /tmp/torchelastic_m9x_grra/none_pne3h_69/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, rank : 103 (local_rank: 7) exitcode : 1 (pid: 2637945) error_file: /tmp/torchelastic__3ae86eo/none_c2w8e34p/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam39-ib0 rank : 208 (local_rank: 0) exitcode : 1 (pid: 1374287) error_file: /tmp/torchelastic_0xmze9a2/none_p82ylmvb/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, rank : 23 (local_rank: 7) exitcode : 1 (pid: 1981031) error_file: /tmp/torchelastic_r11ollqe/none__2es3gou/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 207 (local_rank: 7) exitcode : 1 (pid: 3784752) error_file: /tmp/torchelastic_ab06ab76/none_aepny2a6/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 134 (local_rank: 6) exitcode : 1 (pid: 3608614) error_file: /tmp/torchelastic_hb70z5ko/none_qfbr0tqr/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 87 (local_rank: 7) exitcode : 1 (pid: 2228412) error_file: /tmp/torchelastic_vac09t1x/none_ap52qdgb/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' rank : 55 (local_rank: 7) exitcode : 1 (pid: 2931749) error_file: /tmp/torchelastic_lnjnjfmh/none_kdpc5rz4/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' rank : 191 (local_rank: 7) exitcode : 1 (pid: 1800883) error_file: /tmp/torchelastic_snuy6_qo/none_s1cp4mn_/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 119 (local_rank: 7) exitcode : 1 (pid: 420581) error_file: /tmp/torchelastic_j_jqwl_n/none_8aq7hvb1/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam15-ib0 rank : 88 (local_rank: 0) exitcode : 1 (pid: 2135275) error_file: /tmp/torchelastic_8x3dfen6/none_khxdiy95/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam03-ib0 rank : 13 (local_rank: 5) exitcode : 1 (pid: 1891451) error_file: /tmp/torchelastic_5yeok41_/none_sjgk_997/attempt_0/5/error.json traceback : Traceback (most recent call last): ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam43-ib0 rank : 240 (local_rank: 0) exitcode : 1 (pid: 2981931) error_file: /tmp/torchelastic_vnpgrr_a/none_us6w0t7i/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam40-ib0 rank : 216 (local_rank: 0) exitcode : 1 (pid: 1319936) error_file: /tmp/torchelastic_2q3yk3no/none_rjd9r66h/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( return launch_agent(self._config, self._entrypoint, list(args)) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam07-ib0 rank : 40 (local_rank: 0) exitcode : 1 (pid: 3954902) error_file: /tmp/torchelastic_7dkoz3aw/none_x7p7thmg/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam18-ib0 rank : 96 (local_rank: 0) exitcode : 1 (pid: 2637938) error_file: /tmp/torchelastic__3ae86eo/none_c2w8e34p/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam04-ib0 rank : 16 (local_rank: 0) exitcode : 1 (pid: 1981024) error_file: /tmp/torchelastic_r11ollqe/none__2es3gou/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam38-ib0 rank : 200 (local_rank: 0) exitcode : 1 (pid: 3784745) error_file: /tmp/torchelastic_ab06ab76/none_aepny2a6/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam28-ib0 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam14-ib0 rank : 80 (local_rank: 0) exitcode : 1 (pid: 2228405) error_file: /tmp/torchelastic_vac09t1x/none_ap52qdgb/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam08-ib0 rank : 48 (local_rank: 0) exitcode : 1 (pid: 2931742) error_file: /tmp/torchelastic_lnjnjfmh/none_kdpc5rz4/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam36-ib0 rank : 184 (local_rank: 0) exitcode : 1 (pid: 1800876) error_file: /tmp/torchelastic_snuy6_qo/none_s1cp4mn_/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam26-ib0 rank : 112 (local_rank: 0) exitcode : 1 (pid: 420574) error_file: /tmp/torchelastic_j_jqwl_n/none_8aq7hvb1/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, rank : 135 (local_rank: 7) exitcode : 1 (pid: 3608615) error_file: /tmp/torchelastic_hb70z5ko/none_qfbr0tqr/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam03-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ rank : 14 (local_rank: 6) exitcode : 1 (pid: 1891452) error_file: /tmp/torchelastic_5yeok41_/none_sjgk_997/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam28-ib0 rank : 128 (local_rank: 0) exitcode : 1 (pid: 3608608) error_file: /tmp/torchelastic_hb70z5ko/none_qfbr0tqr/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam03-ib0 torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam37-ib0 rank : 193 (local_rank: 1) exitcode : 1 (pid: 3153360) error_file: /tmp/torchelastic_tgsjd1or/none_lhdsxjh7/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ rank : 15 (local_rank: 7) exitcode : 1 (pid: 1891453) error_file: /tmp/torchelastic_5yeok41_/none_sjgk_997/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam03-ib0 rank : 8 (local_rank: 0) exitcode : 1 (pid: 1891446) error_file: /tmp/torchelastic_5yeok41_/none_sjgk_997/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam37-ib0 rank : 194 (local_rank: 2) exitcode : 1 (pid: 3153361) error_file: /tmp/torchelastic_tgsjd1or/none_lhdsxjh7/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam27-ib0 rank : 121 (local_rank: 1) exitcode : 1 (pid: 249073) error_file: /tmp/torchelastic_sp0j01ml/none_e5ftz_ik/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam37-ib0 rank : 195 (local_rank: 3) exitcode : 1 (pid: 3153362) error_file: /tmp/torchelastic_tgsjd1or/none_lhdsxjh7/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam37-ib0 rank : 196 (local_rank: 4) exitcode : 1 (pid: 3153363) error_file: /tmp/torchelastic_tgsjd1or/none_lhdsxjh7/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam27-ib0 rank : 122 (local_rank: 2) exitcode : 1 (pid: 249074) error_file: /tmp/torchelastic_sp0j01ml/none_e5ftz_ik/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam37-ib0 rank : 197 (local_rank: 5) exitcode : 1 (pid: 3153364) error_file: /tmp/torchelastic_tgsjd1or/none_lhdsxjh7/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam37-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( rank : 198 (local_rank: 6) exitcode : 1 (pid: 3153365) error_file: /tmp/torchelastic_tgsjd1or/none_lhdsxjh7/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam27-ib0 rank : 123 (local_rank: 3) exitcode : 1 (pid: 249075) error_file: /tmp/torchelastic_sp0j01ml/none_e5ftz_ik/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam37-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( rank : 199 (local_rank: 7) exitcode : 1 (pid: 3153366) error_file: /tmp/torchelastic_tgsjd1or/none_lhdsxjh7/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam27-ib0 rank : 124 (local_rank: 4) exitcode : 1 (pid: 249076) error_file: /tmp/torchelastic_sp0j01ml/none_e5ftz_ik/attempt_0/4/error.json traceback : Traceback (most recent call last): train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam37-ib0 rank : 192 (local_rank: 0) exitcode : 1 (pid: 3153359) error_file: /tmp/torchelastic_tgsjd1or/none_lhdsxjh7/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam27-ib0 rank : 125 (local_rank: 5) exitcode : 1 (pid: 249077) error_file: /tmp/torchelastic_sp0j01ml/none_e5ftz_ik/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam27-ib0 rank : 126 (local_rank: 6) exitcode : 1 (pid: 249078) error_file: /tmp/torchelastic_sp0j01ml/none_e5ftz_ik/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam27-ib0 rank : 127 (local_rank: 7) exitcode : 1 (pid: 249079) error_file: /tmp/torchelastic_sp0j01ml/none_e5ftz_ik/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam27-ib0 rank : 120 (local_rank: 0) exitcode : 1 (pid: 249072) error_file: /tmp/torchelastic_sp0j01ml/none_e5ftz_ik/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper raise ChildFailedError( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return launch_agent(self._config, self._entrypoint, list(args)) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam19-ib0 rank : 105 (local_rank: 1) exitcode : 1 (pid: 1443347) error_file: /tmp/torchelastic_lxtiwkkk/none__muhbntb/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam19-ib0 rank : 106 (local_rank: 2) exitcode : 1 (pid: 1443348) error_file: /tmp/torchelastic_lxtiwkkk/none__muhbntb/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) raise ChildFailedError( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam19-ib0 rank : 107 (local_rank: 3) exitcode : 1 (pid: 1443349) error_file: /tmp/torchelastic_lxtiwkkk/none__muhbntb/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam19-ib0 rank : 108 (local_rank: 4) exitcode : 1 (pid: 1443350) error_file: /tmp/torchelastic_lxtiwkkk/none__muhbntb/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam19-ib0 rank : 109 (local_rank: 5) exitcode : 1 (pid: 1443351) error_file: /tmp/torchelastic_lxtiwkkk/none__muhbntb/attempt_0/5/error.json traceback : Traceback (most recent call last): torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam06-ib0 rank : 33 (local_rank: 1) exitcode : 1 (pid: 3632157) error_file: /tmp/torchelastic_64zgyl9s/none_i0x3kw38/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam06-ib0 rank : 34 (local_rank: 2) exitcode : 1 (pid: 3632158) error_file: /tmp/torchelastic_64zgyl9s/none_i0x3kw38/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam06-ib0 rank : 35 (local_rank: 3) exitcode : 1 (pid: 3632159) error_file: /tmp/torchelastic_64zgyl9s/none_i0x3kw38/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam06-ib0 rank : 36 (local_rank: 4) exitcode : 1 (pid: 3632160) error_file: /tmp/torchelastic_64zgyl9s/none_i0x3kw38/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam06-ib0 rank : 37 (local_rank: 5) exitcode : 1 (pid: 3632161) error_file: /tmp/torchelastic_64zgyl9s/none_i0x3kw38/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam06-ib0 rank : 38 (local_rank: 6) exitcode : 1 (pid: 3632162) error_file: /tmp/torchelastic_64zgyl9s/none_i0x3kw38/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam06-ib0 rank : 39 (local_rank: 7) exitcode : 1 (pid: 3632163) error_file: /tmp/torchelastic_64zgyl9s/none_i0x3kw38/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam47-ib0 rank : 273 (local_rank: 1) exitcode : 1 (pid: 928162) error_file: /tmp/torchelastic_beipx1c4/none_2wjynwmn/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_18:58:41 host : jean-zay-iam02-ib0 rank : 1 (local_rank: 1) exitcode : 1 (pid: 3635524) error_file: /tmp/torchelastic__vzcgj53/none_kbstw24a/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam47-ib0 rank : 274 (local_rank: 2) exitcode : 1 (pid: 928163) error_file: /tmp/torchelastic_beipx1c4/none_2wjynwmn/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_18:58:41 host : jean-zay-iam02-ib0 rank : 2 (local_rank: 2) exitcode : 1 (pid: 3635525) error_file: /tmp/torchelastic__vzcgj53/none_kbstw24a/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam47-ib0 rank : 275 (local_rank: 3) exitcode : 1 (pid: 928164) error_file: /tmp/torchelastic_beipx1c4/none_2wjynwmn/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam47-ib0 rank : 276 (local_rank: 4) exitcode : 1 (pid: 928165) error_file: /tmp/torchelastic_beipx1c4/none_2wjynwmn/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_18:58:41 host : jean-zay-iam02-ib0 rank : 3 (local_rank: 3) exitcode : 1 (pid: 3635526) error_file: /tmp/torchelastic__vzcgj53/none_kbstw24a/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_18:58:41 host : jean-zay-iam02-ib0 rank : 4 (local_rank: 4) exitcode : 1 (pid: 3635527) error_file: /tmp/torchelastic__vzcgj53/none_kbstw24a/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam47-ib0 rank : 277 (local_rank: 5) exitcode : 1 (pid: 928166) error_file: /tmp/torchelastic_beipx1c4/none_2wjynwmn/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam47-ib0 dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_18:58:41 host : jean-zay-iam02-ib0 rank : 5 (local_rank: 5) exitcode : 1 (pid: 3635528) error_file: /tmp/torchelastic__vzcgj53/none_kbstw24a/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) rank : 278 (local_rank: 6) exitcode : 1 (pid: 928167) error_file: /tmp/torchelastic_beipx1c4/none_2wjynwmn/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam02-ib0 rank : 6 (local_rank: 6) exitcode : 1 (pid: 3635529) error_file: /tmp/torchelastic__vzcgj53/none_kbstw24a/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam47-ib0 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) rank : 279 (local_rank: 7) exitcode : 1 (pid: 928168) error_file: /tmp/torchelastic_beipx1c4/none_2wjynwmn/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam02-ib0 rank : 7 (local_rank: 7) exitcode : 1 (pid: 3635530) error_file: /tmp/torchelastic__vzcgj53/none_kbstw24a/attempt_0/7/error.json traceback : Traceback (most recent call last): ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam47-ib0 rank : 272 (local_rank: 0) exitcode : 1 (pid: 928161) error_file: /tmp/torchelastic_beipx1c4/none_2wjynwmn/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ time : 2022-09-03_18:58:41 host : jean-zay-iam02-ib0 rank : 0 (local_rank: 0) exitcode : 1 (pid: 3635523) error_file: /tmp/torchelastic__vzcgj53/none_kbstw24a/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam34-ib0 rank : 168 (local_rank: 0) exitcode : 1 (pid: 1715071) error_file: /tmp/torchelastic_0yj1smbs/none_12j8_dhd/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ rank : 62 (local_rank: 6) exitcode : 1 (pid: 2017400) error_file: /tmp/torchelastic_5zqpufgn/none_oc6an53_/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam09-ib0 rank : 63 (local_rank: 7) exitcode : 1 (pid: 2017401) error_file: /tmp/torchelastic_5zqpufgn/none_oc6an53_/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam09-ib0 rank : 56 (local_rank: 0) exitcode : 1 (pid: 2017394) error_file: /tmp/torchelastic_5zqpufgn/none_oc6an53_/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam06-ib0 rank : 32 (local_rank: 0) exitcode : 1 (pid: 3632156) error_file: /tmp/torchelastic_64zgyl9s/none_i0x3kw38/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_18:58:41 host : jean-zay-iam19-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, rank : 110 (local_rank: 6) exitcode : 1 (pid: 1443352) error_file: /tmp/torchelastic_lxtiwkkk/none__muhbntb/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_18:58:41 host : jean-zay-iam19-ib0 rank : 111 (local_rank: 7) exitcode : 1 (pid: 1443353) error_file: /tmp/torchelastic_lxtiwkkk/none__muhbntb/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_18:58:41 host : jean-zay-iam19-ib0 rank : 104 (local_rank: 0) exitcode : 1 (pid: 1443346) error_file: /tmp/torchelastic_lxtiwkkk/none__muhbntb/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ srun: error: jean-zay-iam09: task 7: Exited with exit code 1 srun: launch/slurm: _step_signal: Terminating StepId=927268.0 slurmstepd: error: *** STEP 927268.0 ON jean-zay-iam02 CANCELLED AT 2022-09-03T18:58:45 *** srun: error: jean-zay-iam13: task 9: Exited with exit code 1 srun: error: jean-zay-iam26: task 14: Exited with exit code 1 srun: error: jean-zay-iam05: task 3: Exited with exit code 1 srun: error: jean-zay-iam41: task 28: Exited with exit code 1 srun: error: jean-zay-iam19: task 13: Exited with exit code 1 srun: error: jean-zay-iam32: task 19: Exited with exit code 1 srun: error: jean-zay-iam42: task 29: Exited with exit code 1 srun: error: jean-zay-iam43: task 30: Exited with exit code 1 srun: error: jean-zay-iam35: task 22: Exited with exit code 1 srun: error: jean-zay-iam08: task 6: Exited with exit code 1 srun: error: jean-zay-iam30: task 17: Exited with exit code 1 srun: error: jean-zay-iam38: task 25: Exited with exit code 1 srun: error: jean-zay-iam03: task 1: Exited with exit code 1 srun: error: jean-zay-iam36: task 23: Exited with exit code 1 srun: error: jean-zay-iam47: task 34: Exited with exit code 1 srun: error: jean-zay-iam04: task 2: Exited with exit code 1 srun: error: jean-zay-iam45: task 32: Exited with exit code 1 srun: error: jean-zay-iam11: task 8: Exited with exit code 1 srun: error: jean-zay-iam33: task 20: Exited with exit code 1 srun: error: jean-zay-iam14: task 10: Exited with exit code 1 srun: error: jean-zay-iam44: task 31: Exited with exit code 1 srun: error: jean-zay-iam34: task 21: Exited with exit code 1 srun: error: jean-zay-iam07: task 5: Exited with exit code 1 srun: error: jean-zay-iam40: task 27: Exited with exit code 1 srun: error: jean-zay-iam37: task 24: Exited with exit code 1 srun: error: jean-zay-iam18: task 12: Exited with exit code 1 srun: error: jean-zay-iam06: task 4: Exited with exit code 1 srun: error: jean-zay-iam27: task 15: Exited with exit code 1 srun: error: jean-zay-iam15: task 11: Exited with exit code 1 srun: error: jean-zay-iam31: task 18: Exited with exit code 1 srun: error: jean-zay-iam46: task 33: Exited with exit code 1 srun: error: jean-zay-iam28: task 16: Exited with exit code 1 srun: error: jean-zay-iam39: task 26: Exited with exit code 1 srun: error: jean-zay-iam02: task 0: Exited with exit code 1 srun: error: jean-zay-iam52: task 35: Exited with exit code 1 WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** [default7]:> setting tensorboard ... [default0]:using world size: 288, data-parallel-size: 4, tensor-model-parallel size: 1, pipeline-model-parallel size: 72 [default0]:accumulate and all-reduce gradients in fp32 for bfloat16 data type. [default0]:using torch.bfloat16 for parameters ... [default0]:------------------------ arguments ------------------------ [default0]: abort_on_unmet_fused_kernel_constraints ......... True [default0]: accumulate_allreduce_grads_in_fp32 .............. True [default0]: adam_beta1 ...................................... 0.9 [default0]: adam_beta2 ...................................... 0.95 [default0]: adam_eps ........................................ 1e-08 [default0]: adlr_autoresume ................................. False [default0]: adlr_autoresume_interval ........................ 1000 [default0]: apply_query_key_layer_scaling ................... True [default0]: apply_residual_connection_post_layernorm ........ False [default0]: attention_dropout ............................... 0.1 [default0]: attention_softmax_in_fp32 ....................... False [default0]: bert_binary_head ................................ True [default0]: bert_load ....................................... None [default0]: bf16 ............................................ True [default0]: bias_dropout_fusion ............................. True [default0]: bias_gelu_fusion ................................ True [default0]: biencoder_projection_dim ........................ 0 [default0]: biencoder_shared_query_context_model ............ False [default0]: block_data_path ................................. None [default0]: checkpoint_activations .......................... True [default0]: checkpoint_in_cpu ............................... False [default0]: checkpoint_num_layers ........................... 1 [default0]: clip_grad ....................................... 1.0 [default0]: codecarbon_dir .................................. None [default0]: consumed_train_samples .......................... 0 [default0]: consumed_train_tokens ........................... 0 [default0]: consumed_valid_samples .......................... 0 [default0]: contigious_checkpointing ........................ False [default0]: cpu_optimizer ................................... False [default0]: cpu_torch_adam .................................. False [default0]: curriculum_learning ............................. False [default0]: data_impl ....................................... mmap [default0]: data_parallel_size .............................. 4 [default0]: data_path ....................................... None [default0]: dataloader_type ................................. single [default0]: DDP_impl ........................................ local [default0]: decoder_seq_length .............................. None [default0]: deepscale ....................................... False [default0]: deepscale_config ................................ None [default0]: deepspeed ....................................... True [default0]: deepspeed_activation_checkpointing .............. True [default0]: deepspeed_config ................................ ./ds_config.927326.json [default0]: deepspeed_mpi ................................... False [default0]: distribute_checkpointed_activations ............. False [default0]: distributed_backend ............................. nccl [default0]: embed_layernorm ................................. True [default0]: embedding_path .................................. None [default0]: encoder_seq_length .............................. 2048 [default0]: eod_mask_loss ................................... False [default0]: eval_interval ................................... 250 [default0]: eval_iters ...................................... 1 [default0]: eval_only ....................................... None [default0]: evidence_data_path .............................. None [default0]: exit_duration_in_mins ........................... 5990 [default0]: exit_interval ................................... None [default0]: ffn_hidden_size ................................. 57344 [default0]: finetune ........................................ False [default0]: fp16 ............................................ False [default0]: fp16_lm_cross_entropy ........................... False [default0]: fp32_residual_connection ........................ False [default0]: gigaflos_no_embeds .............................. 0 [default0]: global_batch_size ............................... 2048 [default0]: glu_activation .................................. None [default0]: hidden_dropout .................................. 0.1 [default0]: hidden_size ..................................... 14336 [default0]: hysteresis ...................................... 2 [default0]: ict_head_size ................................... None [default0]: ict_load ........................................ None [default0]: img_dim ......................................... 224 [default0]: indexer_batch_size .............................. 128 [default0]: indexer_log_interval ............................ 1000 [default0]: inference ....................................... False [default0]: init_method_std ................................. 0.0048 [default0]: init_method_xavier_uniform ...................... False [default0]: initial_loss_scale .............................. 4294967296 [default0]: kill_switch_path ................................ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/kill-switch-tr13-176B-mtf [default0]: kv_channels ..................................... 128 [default0]: layernorm_epsilon ............................... 1e-05 [default0]: lazy_mpu_init ................................... None [default0]: load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: local_rank ...................................... None [default0]: log_batch_size_to_tensorboard ................... True [default0]: log_interval .................................... 1 [default0]: log_learning_rate_to_tensorboard ................ True [default0]: log_level ....................................... None [default0]: log_level_replica ............................... None [default0]: log_loss_scale_to_tensorboard ................... True [default0]: log_num_zeros_in_grad ........................... False [default0]: log_params_norm ................................. False [default0]: log_path ........................................ None [default0]: log_timers_to_tensorboard ....................... True [default0]: log_validation_ppl_to_tensorboard ............... True [default0]: loss_on_targets_only ............................ False [default0]: loss_scale ...................................... None [default0]: loss_scale_window ............................... 1000 [default0]: lr .............................................. 2e-05 [default0]: lr_decay_iters .................................. None [default0]: lr_decay_samples ................................ None [default0]: lr_decay_style .................................. constant [default0]: lr_decay_tokens ................................. None [default0]: lr_warmup_fraction .............................. None [default0]: lr_warmup_iters ................................. 0 [default0]: lr_warmup_samples ............................... 0 [default0]: make_vocab_size_divisible_by .................... 128 [default0]: mask_prob ....................................... 0.15 [default0]: masked_softmax_fusion ........................... True [default0]: max_position_embeddings ......................... 2048 [default0]: mean_noise_span_length .......................... None [default0]: memory_centric_tiled_linear ..................... False [default0]: merge_file ...................................... None [default0]: micro_batch_size ................................ 1 [default0]: min_loss_scale .................................. 1.0 [default0]: min_lr .......................................... 0.0 [default0]: mmap_warmup ..................................... False [default0]: no_load_optim ................................... True [default0]: no_load_rng ..................................... None [default0]: no_save_optim ................................... None [default0]: no_save_rng ..................................... None [default0]: noise_density ................................... None [default0]: norm_target_loss ................................ True [default0]: num_attention_heads ............................. 112 [default0]: num_channels .................................... 3 [default0]: num_classes ..................................... 1000 [default0]: num_layers ...................................... 70 [default0]: num_layers_per_virtual_pipeline_stage ........... None [default0]: num_workers ..................................... 2 [default0]: onnx_safe ....................................... None [default0]: openai_gelu ..................................... False [default0]: optimizer ....................................... adam [default0]: override_lr_scheduler ........................... False [default0]: pad_vocab_size_to ............................... 250880 [default0]: params_dtype .................................... torch.bfloat16 [default0]: partition_activations ........................... False [default0]: patch_dim ....................................... 16 [default0]: pipeline_model_parallel_size .................... 72 [default0]: position_embedding_type ......................... PositionEmbeddingType.alibi [default0]: pp_partition_method ............................. type:transformer|embedding [default0]: prefixlm ........................................ False [default0]: profile_backward ................................ False [default0]: query_in_block_prob ............................. 0.1 [default0]: rampup_batch_size ............................... None [default0]: rank ............................................ 0 [default0]: remote_device ................................... none [default0]: reset_attention_mask ............................ False [default0]: reset_position_ids .............................. False [default0]: reset_progress .................................. True [default0]: retriever_report_topk_accuracies ................ [] [default0]: retriever_score_scaling ......................... False [default0]: retriever_seq_length ............................ 256 [default0]: reweight_loss_based_on_position_frequency ....... False [default0]: sample_rate ..................................... 1.0 [default0]: save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: save_interval ................................... 5 [default0]: scatter_gather_tensors_in_pipeline .............. True [default0]: scattered_embeddings ............................ False [default0]: seed ............................................ 42 [default0]: seq_length ...................................... 2048 [default0]: sgd_momentum .................................... 0.9 [default0]: short_seq_prob .................................. 0.1 [default0]: skip_train_iteration_range ...................... None [default0]: split ........................................... None [default0]: split_transformers .............................. False [default0]: sync_tp_duplicated_parameters ................... True [default0]: synchronize_each_layer .......................... False [default0]: tensor_model_parallel_size ...................... 1 [default0]: tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/tr13-176B-ml-t0-logs/tensorboard/p31lossseq [default0]: tensorboard_log_interval ........................ 1 [default0]: tensorboard_queue_size .......................... 5 [default0]: test_weighted_split_paths ....................... None [default0]: test_weighted_split_paths_path .................. None [default0]: tile_factor ..................................... 1 [default0]: titles_data_path ................................ None [default0]: tokenizer_name_or_path .......................... bigscience/tokenizer [default0]: tokenizer_type .................................. PretrainedFromHF [default0]: train_iters ..................................... None [default0]: train_samples ................................... 6348800 [default0]: train_tokens .................................... None [default0]: train_weighted_split_names ...................... ['train'] [default0]: train_weighted_split_paths ...................... [['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train']] [default0]: train_weighted_split_paths_path ................. None [default0]: train_weighted_split_splits ..................... [['0:1']] [default0]: train_weighted_split_weights .................... [['1']] [default0]: universal_checkpoint ............................ True [default0]: use_bnb_optimizer ............................... False [default0]: use_checkpoint_lr_scheduler ..................... False [default0]: use_contiguous_buffers_in_ddp ................... True [default0]: use_cpu_initialization .......................... None [default0]: use_one_sent_docs ............................... False [default0]: use_pin_memory .................................. False [default0]: valid_num_workers ............................... 2 [default0]: valid_weighted_split_names ...................... ['validation_pretraining', 'valid_ar', 'valid_ca', 'valid_code', 'valid_en', 'valid_es', 'valid_eu', 'valid_fr', 'valid_id', 'valid_indic-as', 'valid_indic-bn', 'valid_indic-gu', 'valid_indic-hi', 'valid_indic-kn', 'valid_indic-ml', 'valid_indic-mr', 'valid_indic-ne', 'valid_indic-or', 'valid_indic-pa', 'valid_indic-ta', 'valid_indic-te', 'valid_indic-ur', 'valid_nigercongo-all', 'valid_oscar-en', 'valid_oscar-zh', 'valid_pt', 'valid_vi', 'valid_zhs', 'valid_zht', 'valid'] [default0]: valid_weighted_split_paths ...................... [['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document'], ['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation']] [default0]: valid_weighted_split_paths_path ................. None [default0]: valid_weighted_split_splits ..................... [['0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0.950:1.0'], ['0:1']] [default0]: valid_weighted_split_weights .................... [['0.0330676168743166', '0.011242051312222764', '0.13027200903379185', '0.22171164529099704', '0.10667815627928671', '0.0015595123898173287', '0.13054018439603915', '0.01091803753667153', '0.00011021422347108609', '0.005492381453597748', '0.0004021215011318779', '0.007470068593492175', '0.0006190467776576425', '0.0010335296343329384', '0.0005012010684646179', '0.0006672772956128299', '0.00035928138344705506', '0.0005084433130291778', '0.0021137328219915496', '0.0009129946225980253', '0.0012454301613725426', '0.00031588689199263235', '0.08137213783015229', '0.055293935695898196', '0.04954150576361177', '0.02461641286531197', '0.12091748245519074', '0.0005177025345001541'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1'], ['1']] [default0]: virtual_pipeline_model_parallel_size ............ None [default0]: vocab_extra_ids ................................. 0 [default0]: vocab_file ...................................... None [default0]: weight_decay .................................... 0.0001 [default0]: world_size ...................................... 288 [default0]: zero_allgather_bucket_size ...................... 0.0 [default0]: zero_contigious_gradients ....................... False [default0]: zero_reduce_bucket_size ......................... 0.0 [default0]: zero_reduce_scatter ............................. False [default0]: zero_stage ...................................... 0 [default0]:-------------------- end of arguments --------------------- [default0]:setting number of micro-batches to constant 512 [default0]:> building PretrainedFromHF tokenizer ... [default0]: vocab file is un-used. loading tokenizer from pre-trained model [default0]:Offline mode: forcing local_files_only=True [default0]:Offline mode: forcing local_files_only=True [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer.json from cache at /gpfswork/rech/six/commun/models/29d0a41f4527257b8afe6d5495f492dac260318430f18239a42ca5f6dc4487fc.7b0fb8edc2986944ff9b7418149b52d8c4a1354a17d0360deb8974da70c6cc03 [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/added_tokens.json from cache at None [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/special_tokens_map.json from cache at /gpfswork/rech/six/commun/models/4f03e43bcc54e0721823e6a06b1d197905e2ea79aa7dcc1a0f0fcecc73ce3fb2.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer_config.json from cache at /gpfswork/rech/six/commun/models/9441c67b923ef7a65950a64e31c40f80ed181ba59502981a80f2cd0c438c6432.3c09887250243e50d8de9d10b2a778152434f62a22a95b5f89dbbe79a6eb496a [default0]: > padded vocab (size: 250680) with 200 dummy tokens (new size: 250880) [default0]:DeepSpeed general environment info: [default0]:torch install path ............... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch'] [default0]:torch version .................... 1.12.0 [default0]:torch cuda version ............... 11.3 [default0]:torch hip version ................ None [default0]:nvcc version ..................... 11.4 [default0]:deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed'] [default0]:deepspeed info ................... 0.7.1+8b2a6371, 8b2a6371, master [default0]:deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3 [default0]:**** Git info for Megatron: git_hash=6c1018f git_branch=mtf-multival **** [default0]:> initializing torch distributed ... [default0]:[2022-09-03 19:19:47,801] [INFO] [comm.py:628:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [default0]:> initializing tensor model parallel with size 1 [default0]:> initializing pipeline model parallel with size 72 [default0]:> setting random seeds to 42 ... [default0]:[2022-09-03 19:20:06,911] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 [default0]:> compiling dataset index builder ... [default0]:make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:make: Nothing to be done for 'default'. [default0]:make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:>>> done with dataset index builder. Compilation time: 0.094 seconds [default0]:> compiling and loading fused kernels ... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module fused_mix_prec_layer_norm_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module fused_mix_prec_layer_norm_cuda... [default0]:>>> done with compiling and loading fused kernels. Compilation time: 6.623 seconds [default0]:time to initialize megatron (seconds): 77.631 [default0]:[after megatron is initialized] datetime: 2022-09-03 19:20:13 [default0]:building GPT model ... [default0]:[2022-09-03 19:20:13,677] [INFO] [utils.py:827:see_memory_usage] Before Building Model [default0]:[2022-09-03 19:20:13,677] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [default0]:[2022-09-03 19:20:13,677] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 35.97 GB, percent = 7.1% [default0]:SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None [default0]:Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=1, model=0): 5, ProcessCoord(pipe=1, data=2, model=0): 6, ProcessCoord(pipe=1, data=3, model=0): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=1, model=0): 9, ProcessCoord(pipe=2, data=2, model=0): 10, ProcessCoord(pipe=2, data=3, model=0): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=1, model=0): 13, ProcessCoord(pipe=3, data=2, model=0): 14, ProcessCoord(pipe=3, data=3, model=0): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=1, model=0): 17, ProcessCoord(pipe=4, data=2, model=0): 18, ProcessCoord(pipe=4, data=3, model=0): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=1, model=0): 21, ProcessCoord(pipe=5, data=2, model=0): 22, ProcessCoord(pipe=5, data=3, model=0): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=1, model=0): 25, ProcessCoord(pipe=6, data=2, model=0): 26, ProcessCoord(pipe=6, data=3, model=0): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=1, model=0): 29, ProcessCoord(pipe=7, data=2, model=0): 30, ProcessCoord(pipe=7, data=3, model=0): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=1, model=0): 33, ProcessCoord(pipe=8, data=2, model=0): 34, ProcessCoord(pipe=8, data=3, model=0): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=1, model=0): 37, ProcessCoord(pipe=9, data=2, model=0): 38, ProcessCoord(pipe=9, data=3, model=0): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=1, model=0): 41, ProcessCoord(pipe=10, data=2, model=0): 42, ProcessCoord(pipe=10, data=3, model=0): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=1, model=0): 45, ProcessCoord(pipe=11, data=2, model=0): 46, ProcessCoord(pipe=11, data=3, model=0): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=1, model=0): 49, ProcessCoord(pipe=12, data=2, model=0): 50, ProcessCoord(pipe=12, data=3, model=0): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=1, model=0): 53, ProcessCoord(pipe=13, data=2, model=0): 54, ProcessCoord(pipe=13, data=3, model=0): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=1, model=0): 57, ProcessCoord(pipe=14, data=2, model=0): 58, ProcessCoord(pipe=14, data=3, model=0): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=1, model=0): 61, ProcessCoord(pipe=15, data=2, model=0): 62, ProcessCoord(pipe=15, data=3, model=0): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=1, model=0): 65, ProcessCoord(pipe=16, data=2, model=0): 66, ProcessCoord(pipe=16, data=3, model=0): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=1, model=0): 69, ProcessCoord(pipe=17, data=2, model=0): 70, ProcessCoord(pipe=17, data=3, model=0): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=1, model=0): 73, ProcessCoord(pipe=18, data=2, model=0): 74, ProcessCoord(pipe=18, data=3, model=0): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=1, model=0): 77, ProcessCoord(pipe=19, data=2, model=0): 78, ProcessCoord(pipe=19, data=3, model=0): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=1, model=0): 81, ProcessCoord(pipe=20, data=2, model=0): 82, ProcessCoord(pipe=20, data=3, model=0): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=1, model=0): 85, ProcessCoord(pipe=21, data=2, model=0): 86, ProcessCoord(pipe=21, data=3, model=0): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=1, model=0): 89, ProcessCoord(pipe=22, data=2, model=0): 90, ProcessCoord(pipe=22, data=3, model=0): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=1, model=0): 93, ProcessCoord(pipe=23, data=2, model=0): 94, ProcessCoord(pipe=23, data=3, model=0): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=1, model=0): 97, ProcessCoord(pipe=24, data=2, model=0): 98, ProcessCoord(pipe=24, data=3, model=0): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=1, model=0): 101, ProcessCoord(pipe=25, data=2, model=0): 102, ProcessCoord(pipe=25, data=3, model=0): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=1, model=0): 105, ProcessCoord(pipe=26, data=2, model=0): 106, ProcessCoord(pipe=26, data=3, model=0): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=1, model=0): 109, ProcessCoord(pipe=27, data=2, model=0): 110, ProcessCoord(pipe=27, data=3, model=0): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=1, model=0): 113, ProcessCoord(pipe=28, data=2, model=0): 114, ProcessCoord(pipe=28, data=3, model=0): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=1, model=0): 117, ProcessCoord(pipe=29, data=2, model=0): 118, ProcessCoord(pipe=29, data=3, model=0): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=1, model=0): 121, ProcessCoord(pipe=30, data=2, model=0): 122, ProcessCoord(pipe=30, data=3, model=0): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=1, model=0): 125, ProcessCoord(pipe=31, data=2, model=0): 126, ProcessCoord(pipe=31, data=3, model=0): 127, ProcessCoord(pipe=32, data=0, model=0): 128, ProcessCoord(pipe=32, data=1, model=0): 129, ProcessCoord(pipe=32, data=2, model=0): 130, ProcessCoord(pipe=32, data=3, model=0): 131, ProcessCoord(pipe=33, data=0, model=0): 132, ProcessCoord(pipe=33, data=1, model=0): 133, ProcessCoord(pipe=33, data=2, model=0): 134, ProcessCoord(pipe=33, data=3, model=0): 135, ProcessCoord(pipe=34, data=0, model=0): 136, ProcessCoord(pipe=34, data=1, model=0): 137, ProcessCoord(pipe=34, data=2, model=0): 138, ProcessCoord(pipe=34, data=3, model=0): 139, ProcessCoord(pipe=35, data=0, model=0): 140, ProcessCoord(pipe=35, data=1, model=0): 141, ProcessCoord(pipe=35, data=2, model=0): 142, ProcessCoord(pipe=35, data=3, model=0): 143, ProcessCoord(pipe=36, data=0, model=0): 144, ProcessCoord(pipe=36, data=1, model=0): 145, ProcessCoord(pipe=36, data=2, model=0): 146, ProcessCoord(pipe=36, data=3, model=0): 147, ProcessCoord(pipe=37, data=0, model=0): 148, ProcessCoord(pipe=37, data=1, model=0): 149, ProcessCoord(pipe=37, data=2, model=0): 150, ProcessCoord(pipe=37, data=3, model=0): 151, ProcessCoord(pipe=38, data=0, model=0): 152, ProcessCoord(pipe=38, data=1, model=0): 153, ProcessCoord(pipe=38, data=2, model=0): 154, ProcessCoord(pipe=38, data=3, model=0): 155, ProcessCoord(pipe=39, data=0, model=0): 156, ProcessCoord(pipe=39, data=1, model=0): 157, ProcessCoord(pipe=39, data=2, model=0): 158, ProcessCoord(pipe=39, data=3, model=0): 159, ProcessCoord(pipe=40, data=0, model=0): 160, ProcessCoord(pipe=40, data=1, model=0): 161, ProcessCoord(pipe=40, data=2, model=0): 162, ProcessCoord(pipe=40, data=3, model=0): 163, ProcessCoord(pipe=41, data=0, model=0): 164, ProcessCoord(pipe=41, data=1, model=0): 165, ProcessCoord(pipe=41, data=2, model=0): 166, ProcessCoord(pipe=41, data=3, model=0): 167, ProcessCoord(pipe=42, data=0, model=0): 168, ProcessCoord(pipe=42, data=1, model=0): 169, ProcessCoord(pipe=42, data=2, model=0): 170, ProcessCoord(pipe=42, data=3, model=0): 171, ProcessCoord(pipe=43, data=0, model=0): 172, ProcessCoord(pipe=43, data=1, model=0): 173, ProcessCoord(pipe=43, data=2, model=0): 174, ProcessCoord(pipe=43, data=3, model=0): 175, ProcessCoord(pipe=44, data=0, model=0): 176, ProcessCoord(pipe=44, data=1, model=0): 177, ProcessCoord(pipe=44, data=2, model=0): 178, ProcessCoord(pipe=44, data=3, model=0): 179, ProcessCoord(pipe=45, data=0, model=0): 180, ProcessCoord(pipe=45, data=1, model=0): 181, ProcessCoord(pipe=45, data=2, model=0): 182, ProcessCoord(pipe=45, data=3, model=0): 183, ProcessCoord(pipe=46, data=0, model=0): 184, ProcessCoord(pipe=46, data=1, model=0): 185, ProcessCoord(pipe=46, data=2, model=0): 186, ProcessCoord(pipe=46, data=3, model=0): 187, ProcessCoord(pipe=47, data=0, model=0): 188, ProcessCoord(pipe=47, data=1, model=0): 189, ProcessCoord(pipe=47, data=2, model=0): 190, ProcessCoord(pipe=47, data=3, model=0): 191, ProcessCoord(pipe=48, data=0, model=0): 192, ProcessCoord(pipe=48, data=1, model=0): 193, ProcessCoord(pipe=48, data=2, model=0): 194, ProcessCoord(pipe=48, data=3, model=0): 195, ProcessCoord(pipe=49, data=0, model=0): 196, ProcessCoord(pipe=49, data=1, model=0): 197, ProcessCoord(pipe=49, data=2, model=0): 198, ProcessCoord(pipe=49, data=3, model=0): 199, ProcessCoord(pipe=50, data=0, model=0): 200, ProcessCoord(pipe=50, data=1, model=0): 201, ProcessCoord(pipe=50, data=2, model=0): 202, ProcessCoord(pipe=50, data=3, model=0): 203, ProcessCoord(pipe=51, data=0, model=0): 204, ProcessCoord(pipe=51, data=1, model=0): 205, ProcessCoord(pipe=51, data=2, model=0): 206, ProcessCoord(pipe=51, data=3, model=0): 207, ProcessCoord(pipe=52, data=0, model=0): 208, ProcessCoord(pipe=52, data=1, model=0): 209, ProcessCoord(pipe=52, data=2, model=0): 210, ProcessCoord(pipe=52, data=3, model=0): 211, ProcessCoord(pipe=53, data=0, model=0): 212, ProcessCoord(pipe=53, data=1, model=0): 213, ProcessCoord(pipe=53, data=2, model=0): 214, ProcessCoord(pipe=53, data=3, model=0): 215, ProcessCoord(pipe=54, data=0, model=0): 216, ProcessCoord(pipe=54, data=1, model=0): 217, ProcessCoord(pipe=54, data=2, model=0): 218, ProcessCoord(pipe=54, data=3, model=0): 219, ProcessCoord(pipe=55, data=0, model=0): 220, ProcessCoord(pipe=55, data=1, model=0): 221, ProcessCoord(pipe=55, data=2, model=0): 222, ProcessCoord(pipe=55, data=3, model=0): 223, ProcessCoord(pipe=56, data=0, model=0): 224, ProcessCoord(pipe=56, data=1, model=0): 225, ProcessCoord(pipe=56, data=2, model=0): 226, ProcessCoord(pipe=56, data=3, model=0): 227, ProcessCoord(pipe=57, data=0, model=0): 228, ProcessCoord(pipe=57, data=1, model=0): 229, ProcessCoord(pipe=57, data=2, model=0): 230, ProcessCoord(pipe=57, data=3, model=0): 231, ProcessCoord(pipe=58, data=0, model=0): 232, ProcessCoord(pipe=58, data=1, model=0): 233, ProcessCoord(pipe=58, data=2, model=0): 234, ProcessCoord(pipe=58, data=3, model=0): 235, ProcessCoord(pipe=59, data=0, model=0): 236, ProcessCoord(pipe=59, data=1, model=0): 237, ProcessCoord(pipe=59, data=2, model=0): 238, ProcessCoord(pipe=59, data=3, model=0): 239, ProcessCoord(pipe=60, data=0, model=0): 240, ProcessCoord(pipe=60, data=1, model=0): 241, ProcessCoord(pipe=60, data=2, model=0): 242, ProcessCoord(pipe=60, data=3, model=0): 243, ProcessCoord(pipe=61, data=0, model=0): 244, ProcessCoord(pipe=61, data=1, model=0): 245, ProcessCoord(pipe=61, data=2, model=0): 246, ProcessCoord(pipe=61, data=3, model=0): 247, ProcessCoord(pipe=62, data=0, model=0): 248, ProcessCoord(pipe=62, data=1, model=0): 249, ProcessCoord(pipe=62, data=2, model=0): 250, ProcessCoord(pipe=62, data=3, model=0): 251, ProcessCoord(pipe=63, data=0, model=0): 252, ProcessCoord(pipe=63, data=1, model=0): 253, ProcessCoord(pipe=63, data=2, model=0): 254, ProcessCoord(pipe=63, data=3, model=0): 255, ProcessCoord(pipe=64, data=0, model=0): 256, ProcessCoord(pipe=64, data=1, model=0): 257, ProcessCoord(pipe=64, data=2, model=0): 258, ProcessCoord(pipe=64, data=3, model=0): 259, ProcessCoord(pipe=65, data=0, model=0): 260, ProcessCoord(pipe=65, data=1, model=0): 261, ProcessCoord(pipe=65, data=2, model=0): 262, ProcessCoord(pipe=65, data=3, model=0): 263, ProcessCoord(pipe=66, data=0, model=0): 264, ProcessCoord(pipe=66, data=1, model=0): 265, ProcessCoord(pipe=66, data=2, model=0): 266, ProcessCoord(pipe=66, data=3, model=0): 267, ProcessCoord(pipe=67, data=0, model=0): 268, ProcessCoord(pipe=67, data=1, model=0): 269, ProcessCoord(pipe=67, data=2, model=0): 270, ProcessCoord(pipe=67, data=3, model=0): 271, ProcessCoord(pipe=68, data=0, model=0): 272, ProcessCoord(pipe=68, data=1, model=0): 273, ProcessCoord(pipe=68, data=2, model=0): 274, ProcessCoord(pipe=68, data=3, model=0): 275, ProcessCoord(pipe=69, data=0, model=0): 276, ProcessCoord(pipe=69, data=1, model=0): 277, ProcessCoord(pipe=69, data=2, model=0): 278, ProcessCoord(pipe=69, data=3, model=0): 279, ProcessCoord(pipe=70, data=0, model=0): 280, ProcessCoord(pipe=70, data=1, model=0): 281, ProcessCoord(pipe=70, data=2, model=0): 282, ProcessCoord(pipe=70, data=3, model=0): 283, ProcessCoord(pipe=71, data=0, model=0): 284, ProcessCoord(pipe=71, data=1, model=0): 285, ProcessCoord(pipe=71, data=2, model=0): 286, ProcessCoord(pipe=71, data=3, model=0): 287} [default0]:[2022-09-03 19:20:17,557] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer|embedding [default0]:stage=0 layers=3 [default0]: 0: _to_float16 [default0]: 1: EmbeddingPipe [default0]: 2: [default0]:stage=1 layers=1 [default0]: 3: ParallelTransformerLayerPipe [default0]:stage=2 layers=1 [default0]: 4: ParallelTransformerLayerPipe [default0]:stage=3 layers=1 [default0]: 5: ParallelTransformerLayerPipe [default0]:stage=4 layers=1 [default0]: 6: ParallelTransformerLayerPipe [default0]:stage=5 layers=1 [default0]: 7: ParallelTransformerLayerPipe [default0]:stage=6 layers=1 [default0]: 8: ParallelTransformerLayerPipe [default0]:stage=7 layers=1 [default0]: 9: ParallelTransformerLayerPipe [default0]:stage=8 layers=1 [default0]: 10: ParallelTransformerLayerPipe [default0]:stage=9 layers=1 [default0]: 11: ParallelTransformerLayerPipe [default0]:stage=10 layers=1 [default0]: 12: ParallelTransformerLayerPipe [default0]:stage=11 layers=1 [default0]: 13: ParallelTransformerLayerPipe [default0]:stage=12 layers=1 [default0]: 14: ParallelTransformerLayerPipe [default0]:stage=13 layers=1 [default0]: 15: ParallelTransformerLayerPipe [default0]:stage=14 layers=1 [default0]: 16: ParallelTransformerLayerPipe [default0]:stage=15 layers=1 [default0]: 17: ParallelTransformerLayerPipe [default0]:stage=16 layers=1 [default0]: 18: ParallelTransformerLayerPipe [default0]:stage=17 layers=1 [default0]: 19: ParallelTransformerLayerPipe [default0]:stage=18 layers=1 [default0]: 20: ParallelTransformerLayerPipe [default0]:stage=19 layers=1 [default0]: 21: ParallelTransformerLayerPipe [default0]:stage=20 layers=1 [default0]: 22: ParallelTransformerLayerPipe [default0]:stage=21 layers=1 [default0]: 23: ParallelTransformerLayerPipe [default0]:stage=22 layers=1 [default0]: 24: ParallelTransformerLayerPipe [default0]:stage=23 layers=1 [default0]: 25: ParallelTransformerLayerPipe [default0]:stage=24 layers=1 [default0]: 26: ParallelTransformerLayerPipe [default0]:stage=25 layers=1 [default0]: 27: ParallelTransformerLayerPipe [default0]:stage=26 layers=1 [default0]: 28: ParallelTransformerLayerPipe [default0]:stage=27 layers=1 [default0]: 29: ParallelTransformerLayerPipe [default0]:stage=28 layers=1 [default0]: 30: ParallelTransformerLayerPipe [default0]:stage=29 layers=1 [default0]: 31: ParallelTransformerLayerPipe [default0]:stage=30 layers=1 [default0]: 32: ParallelTransformerLayerPipe [default0]:stage=31 layers=1 [default0]: 33: ParallelTransformerLayerPipe [default0]:stage=32 layers=1 [default0]: 34: ParallelTransformerLayerPipe [default0]:stage=33 layers=1 [default0]: 35: ParallelTransformerLayerPipe [default0]:stage=34 layers=1 [default0]: 36: ParallelTransformerLayerPipe [default0]:stage=35 layers=1 [default0]: 37: ParallelTransformerLayerPipe [default0]:stage=36 layers=1 [default0]: 38: ParallelTransformerLayerPipe [default0]:stage=37 layers=1 [default0]: 39: ParallelTransformerLayerPipe [default0]:stage=38 layers=1 [default0]: 40: ParallelTransformerLayerPipe [default0]:stage=39 layers=1 [default0]: 41: ParallelTransformerLayerPipe [default0]:stage=40 layers=1 [default0]: 42: ParallelTransformerLayerPipe [default0]:stage=41 layers=1 [default0]: 43: ParallelTransformerLayerPipe [default0]:stage=42 layers=1 [default0]: 44: ParallelTransformerLayerPipe [default0]:stage=43 layers=1 [default0]: 45: ParallelTransformerLayerPipe [default0]:stage=44 layers=1 [default0]: 46: ParallelTransformerLayerPipe [default0]:stage=45 layers=1 [default0]: 47: ParallelTransformerLayerPipe [default0]:stage=46 layers=1 [default0]: 48: ParallelTransformerLayerPipe [default0]:stage=47 layers=1 [default0]: 49: ParallelTransformerLayerPipe [default0]:stage=48 layers=1 [default0]: 50: ParallelTransformerLayerPipe [default0]:stage=49 layers=1 [default0]: 51: ParallelTransformerLayerPipe [default0]:stage=50 layers=1 [default0]: 52: ParallelTransformerLayerPipe [default0]:stage=51 layers=1 [default0]: 53: ParallelTransformerLayerPipe [default0]:stage=52 layers=1 [default0]: 54: ParallelTransformerLayerPipe [default0]:stage=53 layers=1 [default0]: 55: ParallelTransformerLayerPipe [default0]:stage=54 layers=1 [default0]: 56: ParallelTransformerLayerPipe [default0]:stage=55 layers=1 [default0]: 57: ParallelTransformerLayerPipe [default0]:stage=56 layers=1 [default0]: 58: ParallelTransformerLayerPipe [default0]:stage=57 layers=1 [default0]: 59: ParallelTransformerLayerPipe [default0]:stage=58 layers=1 [default0]: 60: ParallelTransformerLayerPipe [default0]:stage=59 layers=1 [default0]: 61: ParallelTransformerLayerPipe [default0]:stage=60 layers=1 [default0]: 62: ParallelTransformerLayerPipe [default0]:stage=61 layers=1 [default0]: 63: ParallelTransformerLayerPipe [default0]:stage=62 layers=1 [default0]: 64: ParallelTransformerLayerPipe [default0]:stage=63 layers=1 [default0]: 65: ParallelTransformerLayerPipe [default0]:stage=64 layers=1 [default0]: 66: ParallelTransformerLayerPipe [default0]:stage=65 layers=1 [default0]: 67: ParallelTransformerLayerPipe [default0]:stage=66 layers=1 [default0]: 68: ParallelTransformerLayerPipe [default0]:stage=67 layers=1 [default0]: 69: ParallelTransformerLayerPipe [default0]:stage=68 layers=1 [default0]: 70: ParallelTransformerLayerPipe [default0]:stage=69 layers=1 [default0]: 71: ParallelTransformerLayerPipe [default0]:stage=70 layers=3 [default0]: 72: ParallelTransformerLayerPipe [default0]: 73: undo [default0]: 74: MixedFusedLayerNorm [default0]:stage=71 layers=2 [default0]: 75: EmbeddingPipe [default0]: 76: float16_to_fp32 [default0]: loss: CrossEntropy [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default2]:Building extension module utils... [default2]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default4]:Loading extension module utils... [default7]:Loading extension module utils... [default5]:Loading extension module utils... [default6]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:ninja: no work to do. [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.34581494331359863 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0011513233184814453 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Time to load utils op: 0.40961694717407227 seconds [default4]:Time to load utils op: 0.41037964820861816 seconds [default5]:Time to load utils op: 0.41002321243286133 seconds [default6]:Time to load utils op: 0.40944409370422363 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00052642822265625 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.000514984130859375 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004935264587402344 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004954338073730469 seconds [default3]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default3]:Building extension module utils... [default3]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21180510520935059 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20985794067382812 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3180091381072998 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.31761741638183594 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30625271797180176 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20228195190429688 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20266294479370117 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20583534240722656 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20271682739257812 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30617475509643555 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.31822633743286133 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3058052062988281 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20292043685913086 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30584216117858887 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30586791038513184 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2033240795135498 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20321059226989746 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30582356452941895 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20333433151245117 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3181304931640625 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20299434661865234 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20965075492858887 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20244407653808594 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21085309982299805 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20249509811401367 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20544219017028809 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20267558097839355 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20265650749206543 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2058572769165039 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20323443412780762 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30631327629089355 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3062291145324707 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20397043228149414 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20162343978881836 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30556678771972656 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20305657386779785 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20345640182495117 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20258092880249023 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20306682586669922 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20609641075134277 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20610380172729492 seconds [default3]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20608115196228027 seconds [default3]:Time to load utils op: 0.20255279541015625 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30542778968811035 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20608234405517578 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3055129051208496 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20265913009643555 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20582818984985352 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21086335182189941 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2027585506439209 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20941877365112305 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2025916576385498 seconds [default0]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20989441871643066 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3055078983306885 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20989441871643066 seconds [default1]:Loading extension module utils... [default0]:Time to load utils op: 0.20988965034484863 seconds [default1]:Time to load utils op: 0.20989608764648438 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20481252670288086 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20482182502746582 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20275425910949707 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20244908332824707 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20250439643859863 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2095353603363037 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20952939987182617 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30914878845214844 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.202561616897583 seconds [default3]:ninja: no work to do. [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.27248287200927734 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3091902732849121 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20407652854919434 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20325231552124023 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3092350959777832 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2034897804260254 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2037663459777832 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21029901504516602 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20602631568908691 seconds [default4]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2026216983795166 seconds [default4]:Time to load utils op: 0.20711803436279297 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2024078369140625 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2025163173675537 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20625877380371094 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20256662368774414 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20244336128234863 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20874452590942383 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20841026306152344 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20609498023986816 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20857810974121094 seconds [default7]:Loading extension module utils... [default5]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20875763893127441 seconds [default5]:Time to load utils op: 0.2024974822998047 seconds [default7]:Time to load utils op: 0.20235896110534668 seconds [default2]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2072458267211914 seconds [default2]:Time to load utils op: 0.2069716453552246 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20622634887695312 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20854663848876953 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20603585243225098 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2060401439666748 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20446515083312988 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20251798629760742 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21047472953796387 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2025899887084961 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21032142639160156 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2059614658355713 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20596885681152344 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20476722717285156 seconds [default5]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2025620937347412 seconds [default5]:Time to load utils op: 0.20834946632385254 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20534467697143555 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2024247646331787 seconds [default2]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21283984184265137 seconds [default2]:Time to load utils op: 0.21285629272460938 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21281719207763672 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20236468315124512 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2022535800933838 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20489215850830078 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20443153381347656 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.215576171875 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20807385444641113 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20241332054138184 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21186423301696777 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2080686092376709 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20272445678710938 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2024989128112793 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2155599594116211 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2155625820159912 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21556568145751953 seconds [default1]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20775365829467773 seconds [default1]:Time to load utils op: 0.20242762565612793 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2024385929107666 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.30585217475891113 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3056349754333496 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20239925384521484 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3058352470397949 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20239996910095215 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2064657211303711 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20650219917297363 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2064986228942871 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2025907039642334 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.30538034439086914 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20244383811950684 seconds [default1]:Loading extension module utils... [default6]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2095623016357422 seconds [default6]:Time to load utils op: 0.2025907039642334 seconds [default2]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.208604097366333 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20793938636779785 seconds [default2]:Time to load utils op: 0.20945191383361816 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2026383876800537 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20829296112060547 seconds [default4]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20781612396240234 seconds [default4]:Time to load utils op: 0.2026653289794922 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20282483100891113 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20255184173583984 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20864629745483398 seconds [default2]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20340967178344727 seconds [default4]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20236539840698242 seconds [default4]:Time to load utils op: 0.20289015769958496 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3092830181121826 seconds [default0]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3092806339263916 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30916690826416016 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2032332420349121 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20870327949523926 seconds [default1]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3092834949493408 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20266222953796387 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2033696174621582 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20262360572814941 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20250821113586426 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20271587371826172 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2027134895324707 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2025165557861328 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20830416679382324 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20817327499389648 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20826435089111328 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20545530319213867 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20457029342651367 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20247244834899902 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20839810371398926 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2128450870513916 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20598626136779785 seconds [default7]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.202592134475708 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20645952224731445 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21284222602844238 seconds [default7]:Time to load utils op: 0.20243525505065918 seconds [default4]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20229268074035645 seconds [default4]:Time to load utils op: 0.20647621154785156 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21175026893615723 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20603251457214355 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20266056060791016 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2062077522277832 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20262670516967773 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20771145820617676 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.218186616897583 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2166447639465332 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2172248363494873 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21721363067626953 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2063426971435547 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21283197402954102 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21031761169433594 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20775437355041504 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20255303382873535 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20557904243469238 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20264935493469238 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2077786922454834 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2077345848083496 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2027595043182373 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2074592113494873 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20605063438415527 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20793628692626953 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20599579811096191 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.21173095703125 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2077183723449707 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20833539962768555 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20269513130187988 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20224761962890625 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20791101455688477 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20264506340026855 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3052647113800049 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20967650413513184 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.21010589599609375 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20237278938293457 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2098677158355713 seconds [default6]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20353388786315918 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30791497230529785 seconds [default6]:Time to load utils op: 0.20267677307128906 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20258116722106934 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20276570320129395 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3078176975250244 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20262646675109863 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20253229141235352 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2024831771850586 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3052642345428467 seconds [default0]:[2022-09-03 19:20:19,293] [INFO] [utils.py:827:see_memory_usage] After Building Model [default0]:[2022-09-03 19:20:19,293] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 19:20:19,294] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.44 GB, percent = 7.2% [default0]:setting training iterations to 3100 [default0]:> learning rate decay style: constant [default0]:DeepSpeed is enabled. [default0]:[2022-09-03 19:20:19,294] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.1+8b2a6371, git-hash=8b2a6371, git-branch=master [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20750999450683594 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20262718200683594 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20602178573608398 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20608139038085938 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2062370777130127 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20264887809753418 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20252442359924316 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20625519752502441 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3052504062652588 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3052840232849121 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2025468349456787 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20249271392822266 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20245647430419922 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20284199714660645 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20256757736206055 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20751690864562988 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20693421363830566 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20232200622558594 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20795345306396484 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20309185981750488 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2066645622253418 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20914149284362793 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.209089994430542 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20224452018737793 seconds [default6]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20265555381774902 seconds [default6]:Time to load utils op: 0.20786285400390625 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20269536972045898 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20911645889282227 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2023484706878662 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20215916633605957 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2026350498199463 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2128465175628662 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21283388137817383 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20912790298461914 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20264720916748047 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20350003242492676 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20272278785705566 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2024073600769043 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20264816284179688 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005598068237304688 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0003249645233154297 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005950927734375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00032639503479003906 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007300376892089844 seconds [default7]:Time to load utils op: 0.0004630088806152344 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005939006805419922 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006480216979980469 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007104873657226562 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004990100860595703 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.000446319580078125 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004584789276123047 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004374980926513672 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004048347473144531 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006825923919677734 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004382133483886719 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005552768707275391 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006999969482421875 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0003552436828613281 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005137920379638672 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00041031837463378906 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004544258117675781 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004563331604003906 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00047469139099121094 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Time to load utils op: 0.0004897117614746094 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0003418922424316406 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30784010887145996 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.000675201416015625 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005743503570556641 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.7235498428344727 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006291866302490234 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00045490264892578125 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006852149963378906 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007503032684326172 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005154609680175781 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004944801330566406 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00044727325439453125 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005235671997070312 seconds [default1]:Time to load utils op: 0.3078577518463135 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005118846893310547 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005114078521728516 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0003635883331298828 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005359649658203125 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007715225219726562 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004513263702392578 seconds [default2]:Time to load utils op: 0.3052973747253418 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0031392574310302734 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.7234258651733398 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00043654441833496094 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.7235527038574219 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00046706199645996094 seconds [default0]:Time to load utils op: 0.30535244941711426 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Time to load utils op: 0.00045037269592285156 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.002980947494506836 seconds [default4]:Time to load utils op: 0.0027723312377929688 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008573532104492188 seconds [default1]:Time to load utils op: 0.30539417266845703 seconds [default3]:Time to load utils op: 0.3050384521484375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004374980926513672 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.003074169158935547 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00070953369140625 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008006095886230469 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007307529449462891 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004436969757080078 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005543231964111328 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006914138793945312 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007469654083251953 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006239414215087891 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00042510032653808594 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00044608116149902344 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005066394805908203 seconds [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006153583526611328 seconds [default2]:Time to load utils op: 0.0006859302520751953 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006322860717773438 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004353523254394531 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0017731189727783203 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0016696453094482422 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0014600753784179688 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007164478302001953 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0003414154052734375 seconds [default6]:Time to load utils op: 0.0004782676696777344 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006093978881835938 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00046825408935546875 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008189678192138672 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007717609405517578 seconds [default7]:Time to load utils op: 0.0007371902465820312 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008301734924316406 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.000438690185546875 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004413127899169922 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004038810729980469 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005292892456054688 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004839897155761719 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005085468292236328 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004596710205078125 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004341602325439453 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006167888641357422 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004260540008544922 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0015156269073486328 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006244182586669922 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004303455352783203 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005147457122802734 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0003910064697265625 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005354881286621094 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004665851593017578 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00039887428283691406 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0003876686096191406 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00043463706970214844 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.000492095947265625 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004544258117675781 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00046181678771972656 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004000663757324219 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0003516674041748047 seconds [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00040268898010253906 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006439685821533203 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004734992980957031 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0009925365447998047 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004754066467285156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004177093505859375 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006933212280273438 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00045609474182128906 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004603862762451172 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00045418739318847656 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004379749298095703 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00038123130798339844 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006186962127685547 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005562305450439453 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006968975067138672 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004622936248779297 seconds [default1]:Time to load utils op: 0.00040221214294433594 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00045871734619140625 seconds [default6]:Time to load utils op: 0.0006122589111328125 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00046253204345703125 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00046706199645996094 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00043010711669921875 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0012099742889404297 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005075931549072266 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006892681121826172 seconds [default3]:Time to load utils op: 0.0006773471832275391 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Time to load utils op: 0.0005426406860351562 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005424022674560547 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005340576171875 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007183551788330078 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004699230194091797 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00048828125 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006163120269775391 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006418228149414062 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0011448860168457031 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006914138793945312 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005650520324707031 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005550384521484375 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0009765625 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006246566772460938 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005948543548583984 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004291534423828125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007033348083496094 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005369186401367188 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00103759765625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.000637054443359375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004410743713378906 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005323886871337891 seconds [default5]:Time to load utils op: 0.0006172657012939453 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00038743019104003906 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004520416259765625 seconds [default2]:Time to load utils op: 0.000423431396484375 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00041961669921875 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007297992706298828 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0009555816650390625 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Time to load utils op: 0.0005757808685302734 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005939006805419922 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005562305450439453 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0009644031524658203 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006966590881347656 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0010445117950439453 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007243156433105469 seconds [default2]:Time to load utils op: 0.0006911754608154297 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005276203155517578 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005829334259033203 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004904270172119141 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005037784576416016 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004818439483642578 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006570816040039062 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006253719329833984 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005164146423339844 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006718635559082031 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006248950958251953 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005397796630859375 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007681846618652344 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default6]:Time to load utils op: 0.0012066364288330078 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006873607635498047 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00038504600524902344 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006089210510253906 seconds [default7]:Time to load utils op: 0.00069427490234375 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006635189056396484 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00043082237243652344 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00045609474182128906 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00046181678771972656 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004413127899169922 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004057884216308594 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00046443939208984375 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00038051605224609375 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005435943603515625 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004775524139404297 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004837512969970703 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00044798851013183594 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.001046895980834961 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0010120868682861328 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0010814666748046875 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0012171268463134766 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008895397186279297 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009069442749023438 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0003898143768310547 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006806850433349609 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0003333091735839844 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00039696693420410156 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006155967712402344 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00040531158447265625 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0003886222839355469 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005688667297363281 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005185604095458984 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005452632904052734 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007851123809814453 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005400180816650391 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004291534423828125 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007939338684082031 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006635189056396484 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005249977111816406 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006220340728759766 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005865097045898438 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Loading extension module utils... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default4]:Time to load utils op: 0.0004405975341796875 seconds [default1]:Time to load utils op: 0.0005486011505126953 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003483295440673828 seconds [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005629062652587891 seconds [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004086494445800781 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005712509155273438 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.000568389892578125 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004038810729980469 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00043845176696777344 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005772113800048828 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004990100860595703 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00043010711669921875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005245208740234375 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0010175704956054688 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007255077362060547 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004885196685791016 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005452632904052734 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00039076805114746094 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004658699035644531 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007295608520507812 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005440711975097656 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004947185516357422 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006964206695556641 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005917549133300781 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00041747093200683594 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00045418739318847656 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004565715789794922 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005524158477783203 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004394054412841797 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004317760467529297 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00045943260192871094 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005066394805908203 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004620552062988281 seconds [default1]:Time to load utils op: 0.0003159046173095703 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004169940948486328 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006322860717773438 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0009367465972900391 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009098052978515625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Time to load utils op: 0.0009920597076416016 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008938312530517578 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005936622619628906 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Time to load utils op: 0.000682830810546875 seconds [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0009386539459228516 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005824565887451172 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006163120269775391 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.000720977783203125 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005314350128173828 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005519390106201172 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0010361671447753906 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007493495941162109 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005526542663574219 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Time to load utils op: 0.0006926059722900391 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006227493286132812 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0010428428649902344 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005903244018554688 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005669593811035156 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-03 19:20:20,017] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [default0]:[2022-09-03 19:20:20,017] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer [default0]:[2022-09-03 19:20:20,017] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer [default0]:[2022-09-03 19:20:20,017] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = {basic_optimizer.__class__.__name__} [default0]:[2022-09-03 19:20:20,017] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer [default0]:[2022-09-03 19:20:20,046] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer [default0]:[2022-09-03 19:20:20,046] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 19:20:20,046] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.61 GB, percent = 7.3% [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default3]:Building extension module utils... [default3]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2063300609588623 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20598578453063965 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20557284355163574 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20560288429260254 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2022559642791748 seconds [default3]:ninja: no work to do. [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.22226333618164062 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3045816421508789 seconds [default0]:[2022-09-03 19:20:20,276] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 [default0]:[2022-09-03 19:20:20,276] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 19:20:20,276] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.6 GB, percent = 7.3% [default0]:[2022-09-03 19:20:20,333] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 [default0]:[2022-09-03 19:20:20,334] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 19:20:20,334] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.6 GB, percent = 7.3% [default0]:[2022-09-03 19:20:20,359] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 [default0]:[2022-09-03 19:20:20,360] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 19:20:20,360] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.6 GB, percent = 7.3% [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3045926094055176 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005562305450439453 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0003616809844970703 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0016498565673828125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0016422271728515625 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0015444755554199219 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0015215873718261719 seconds [default0]:[2022-09-03 19:20:20,386] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 [default0]:[2022-09-03 19:20:20,386] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 19:20:20,387] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.61 GB, percent = 7.3% [default0]:[2022-09-03 19:20:20,412] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer [default0]:[2022-09-03 19:20:20,413] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 19:20:20,413] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.61 GB, percent = 7.3% [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00037789344787597656 seconds [default0]:[2022-09-03 19:20:20,481] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer [default0]:[2022-09-03 19:20:20,481] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-03 19:20:20,482] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.61 GB, percent = 7.3% [default0]:[2022-09-03 19:20:20,508] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer [default0]:[2022-09-03 19:20:20,508] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-03 19:20:20,508] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.61 GB, percent = 7.3% [default0]:[2022-09-03 19:20:20,508] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [default0]:[2022-09-03 19:20:20,508] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler [default0]:[2022-09-03 19:20:20,508] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [default0]:[2022-09-03 19:20:20,508] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05, 2e-05], mom=[(0.9, 0.95), (0.9, 0.95)] [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:987:print] DeepSpeedEngine configuration: [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] activation_checkpointing_config { [default0]: "partition_activations": false, [default0]: "contiguous_memory_optimization": false, [default0]: "cpu_checkpointing": false, [default0]: "number_checkpoints": null, [default0]: "synchronize_checkpoint_boundary": false, [default0]: "profile": false [default0]:} [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] amp_enabled .................. False [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] amp_params ................... False [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] autotuning_config ............ { [default0]: "enabled": false, [default0]: "start_step": null, [default0]: "end_step": null, [default0]: "metric_path": null, [default0]: "arg_mappings": null, [default0]: "metric": "throughput", [default0]: "model_info": null, [default0]: "results_dir": null, [default0]: "exps_dir": null, [default0]: "overwrite": true, [default0]: "fast": true, [default0]: "start_profile_step": 3, [default0]: "end_profile_step": 5, [default0]: "tuner_type": "gridsearch", [default0]: "tuner_early_stopping": 5, [default0]: "tuner_num_trials": 50, [default0]: "model_info_path": null, [default0]: "mp_size": 1, [default0]: "max_train_batch_size": null, [default0]: "min_train_batch_size": 1, [default0]: "max_train_micro_batch_size_per_gpu": 1.024000e+03, [default0]: "min_train_micro_batch_size_per_gpu": 1, [default0]: "num_tuning_micro_batch_sizes": 3 [default0]:} [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] bfloat16_enabled ............. True [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] checkpoint_tag_validation_enabled True [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] checkpoint_tag_validation_fail False [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] comms_config ................. [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] communication_data_type ...... None [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] curriculum_enabled ........... False [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] curriculum_params ............ False [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] dataloader_drop_last ......... False [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] disable_allgather ............ False [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] dump_state ................... False [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] dynamic_loss_scale_args ...... None [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] eigenvalue_enabled ........... False [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] eigenvalue_gas_boundary_resolution 1 [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] eigenvalue_layer_name ........ bert.encoder.layer [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] eigenvalue_layer_num ......... 0 [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] eigenvalue_max_iter .......... 100 [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] eigenvalue_stability ......... 1e-06 [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] eigenvalue_tol ............... 0.01 [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] eigenvalue_verbose ........... False [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] elasticity_enabled ........... False [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] flops_profiler_config ........ { [default0]: "enabled": false, [default0]: "profile_step": 1, [default0]: "module_depth": -1, [default0]: "top_modules": 1, [default0]: "detailed": true, [default0]: "output_file": null [default0]:} [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] fp16_auto_cast ............... None [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] fp16_enabled ................. False [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] fp16_master_weights_and_gradients False [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] global_rank .................. 0 [default0]:[2022-09-03 19:20:20,509] [INFO] [config.py:991:print] gradient_accumulation_steps .. 512 [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] gradient_clipping ............ 1.0 [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] gradient_predivide_factor .... 1.0 [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] initial_dynamic_scale ........ 1 [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] load_universal_checkpoint .... True [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] loss_scale ................... 1.0 [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] memory_breakdown ............. False [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] monitor_config ............... [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] nebula_config ................ { [default0]: "enabled": false, [default0]: "persistent_storage_path": null, [default0]: "persistent_time_interval": 100, [default0]: "num_of_version_in_retention": 2, [default0]: "enable_nebula_load": true, [default0]: "load_path": null [default0]:} [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] optimizer_legacy_fusion ...... False [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] optimizer_name ............... None [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] optimizer_params ............. None [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] pld_enabled .................. False [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] pld_params ................... False [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] prescale_gradients ........... False [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] scheduler_name ............... None [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] scheduler_params ............. None [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] sparse_attention ............. None [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] sparse_gradients_enabled ..... False [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] steps_per_print .............. 2000 [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] train_batch_size ............. 2048 [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] train_micro_batch_size_per_gpu 1 [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] wall_clock_breakdown ......... False [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] world_size ................... 4 [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] zero_allow_untested_optimizer False [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] zero_enabled ................. False [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:991:print] zero_optimization_stage ...... 0 [default0]:[2022-09-03 19:20:20,510] [INFO] [config.py:976:print_user_config] json = { [default0]: "train_micro_batch_size_per_gpu": 1, [default0]: "train_batch_size": 2.048000e+03, [default0]: "gradient_clipping": 1.0, [default0]: "zero_optimization": { [default0]: "stage": 0 [default0]: }, [default0]: "bf16": { [default0]: "enabled": true [default0]: }, [default0]: "steps_per_print": 2.000000e+03, [default0]: "wall_clock_breakdown": false, [default0]: "checkpoint": { [default0]: "load_universal": true [default0]: } [default0]:} [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004634857177734375 seconds [default0]:[2022-09-03 19:20:20,511] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=512 micro_batch_size=1 [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=100 STAGE=25 LAYERS=1 [27, 28) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=276 STAGE=69 LAYERS=1 [71, 72) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=3 [0, 3) STAGE_PARAMS=3596644352 (3596.644M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=68 STAGE=17 LAYERS=1 [19, 20) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=220 STAGE=55 LAYERS=1 [57, 58) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=268 STAGE=67 LAYERS=1 [69, 70) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=264 STAGE=66 LAYERS=1 [68, 69) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=172 STAGE=43 LAYERS=1 [45, 46) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=232 STAGE=58 LAYERS=1 [60, 61) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=236 STAGE=59 LAYERS=1 [61, 62) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=272 STAGE=68 LAYERS=1 [70, 71) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=4 STAGE=1 LAYERS=1 [3, 4) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=168 STAGE=42 LAYERS=1 [44, 45) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=112 STAGE=28 LAYERS=1 [30, 31) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=208 STAGE=52 LAYERS=1 [54, 55) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=212 STAGE=53 LAYERS=1 [55, 56) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=24 STAGE=6 LAYERS=1 [8, 9) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=8 STAGE=2 LAYERS=1 [4, 5) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=12 STAGE=3 LAYERS=1 [5, 6) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=160 STAGE=40 LAYERS=1 [42, 43) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=116 STAGE=29 LAYERS=1 [31, 32) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=28 STAGE=7 LAYERS=1 [9, 10) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=244 STAGE=61 LAYERS=1 [63, 64) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=240 STAGE=60 LAYERS=1 [62, 63) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=252 STAGE=63 LAYERS=1 [65, 66) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=164 STAGE=41 LAYERS=1 [43, 44) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=72 STAGE=18 LAYERS=1 [20, 21) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=16 STAGE=4 LAYERS=1 [6, 7) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=248 STAGE=62 LAYERS=1 [64, 65) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=200 STAGE=50 LAYERS=1 [52, 53) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=152 STAGE=38 LAYERS=1 [40, 41) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=228 STAGE=57 LAYERS=1 [59, 60) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=224 STAGE=56 LAYERS=1 [58, 59) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=76 STAGE=19 LAYERS=1 [21, 22) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=48 STAGE=12 LAYERS=1 [14, 15) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=144 STAGE=36 LAYERS=1 [38, 39) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=96 STAGE=24 LAYERS=1 [26, 27) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=192 STAGE=48 LAYERS=1 [50, 51) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=196 STAGE=49 LAYERS=1 [51, 52) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=120 STAGE=30 LAYERS=1 [32, 33) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=40 STAGE=10 LAYERS=1 [12, 13) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=104 STAGE=26 LAYERS=1 [28, 29) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=180 STAGE=45 LAYERS=1 [47, 48) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=92 STAGE=23 LAYERS=1 [25, 26) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=56 STAGE=14 LAYERS=1 [16, 17) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=60 STAGE=15 LAYERS=1 [17, 18) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=44 STAGE=11 LAYERS=1 [13, 14) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=36 STAGE=9 LAYERS=1 [11, 12) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=108 STAGE=27 LAYERS=1 [29, 30) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=32 STAGE=8 LAYERS=1 [10, 11) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=204 STAGE=51 LAYERS=1 [53, 54) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=52 STAGE=13 LAYERS=1 [15, 16) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=128 STAGE=32 LAYERS=1 [34, 35) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=284 STAGE=71 LAYERS=2 [75, 77) STAGE_PARAMS=3596615680 (3596.616M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=280 STAGE=70 LAYERS=3 [72, 75) STAGE_PARAMS=2466465792 (2466.466M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=80 STAGE=20 LAYERS=1 [22, 23) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=84 STAGE=21 LAYERS=1 [23, 24) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=132 STAGE=33 LAYERS=1 [35, 36) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=156 STAGE=39 LAYERS=1 [41, 42) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=256 STAGE=64 LAYERS=1 [66, 67) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=20 STAGE=5 LAYERS=1 [7, 8) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=176 STAGE=44 LAYERS=1 [46, 47) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=136 STAGE=34 LAYERS=1 [36, 37) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=260 STAGE=65 LAYERS=1 [67, 68) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=188 STAGE=47 LAYERS=1 [49, 50) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=148 STAGE=37 LAYERS=1 [39, 40) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=124 STAGE=31 LAYERS=1 [33, 34) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=140 STAGE=35 LAYERS=1 [37, 38) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=184 STAGE=46 LAYERS=1 [48, 49) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=216 STAGE=54 LAYERS=1 [56, 57) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=88 STAGE=22 LAYERS=1 [24, 25) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:21,073] [INFO] [engine.py:145:__init__] RANK=64 STAGE=16 LAYERS=1 [18, 19) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:20:22,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:20:22,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:20:22,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:20:22,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:20:22,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:20:22,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:20:22,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:20:22,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:20:22,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:20:22,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:20:22,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:20:31,818] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 251 [default3]:[2022-09-03 19:20:31,912] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 163 [default0]:[2022-09-03 19:20:32,971] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 16 [default1]:[2022-09-03 19:20:33,301] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 249 [default0]:[2022-09-03 19:20:33,283] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 248 [default5]:[2022-09-03 19:20:33,387] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 21 [default2]:[2022-09-03 19:20:33,433] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 18 [default4]:[2022-09-03 19:20:33,420] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 20 [default3]:[2022-09-03 19:20:33,439] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 19 [default7]:[2022-09-03 19:20:33,730] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 23 [default6]:[2022-09-03 19:20:33,725] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 22 [default3]:[2022-09-03 19:20:33,842] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 219 [default7]:[2022-09-03 19:20:33,942] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 79 [default2]:[2022-09-03 19:20:33,909] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 218 [default6]:[2022-09-03 19:20:33,929] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 222 [default1]:[2022-09-03 19:20:34,048] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 17 [default2]:[2022-09-03 19:20:34,334] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 250 [default3]:[2022-09-03 19:20:34,532] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 51 [default7]:[2022-09-03 19:20:34,606] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 167 [default3]:[2022-09-03 19:20:34,649] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 171 [default0]:[2022-09-03 19:20:34,743] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 216 [default5]:[2022-09-03 19:20:34,728] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 173 [default0]:[2022-09-03 19:20:34,833] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 160 [default3]:[2022-09-03 19:20:34,812] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 155 [default3]:[2022-09-03 19:20:34,815] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 83 [default7]:[2022-09-03 19:20:34,799] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 271 [default6]:[2022-09-03 19:20:34,793] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 270 [default6]:[2022-09-03 19:20:34,882] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 174 [default7]:[2022-09-03 19:20:34,888] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 175 [default2]:[2022-09-03 19:20:34,983] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 50 [default3]:[2022-09-03 19:20:35,185] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 243 [default6]:[2022-09-03 19:20:35,255] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 166 [default4]:[2022-09-03 19:20:35,286] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 172 [default2]:[2022-09-03 19:20:35,287] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 242 [default2]:[2022-09-03 19:20:35,360] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 282 [default3]:[2022-09-03 19:20:35,381] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 283 [default2]:[2022-09-03 19:20:35,507] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 162 [default7]:[2022-09-03 19:20:35,501] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 255 [default2]:[2022-09-03 19:20:35,514] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 154 [default7]:[2022-09-03 19:20:35,504] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 63 [default3]:[2022-09-03 19:20:35,604] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 195 [default4]:[2022-09-03 19:20:35,692] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 164 [default7]:[2022-09-03 19:20:35,750] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 135 [default6]:[2022-09-03 19:20:35,753] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 134 [default7]:[2022-09-03 19:20:35,736] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 263 [default3]:[2022-09-03 19:20:35,761] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 259 [default6]:[2022-09-03 19:20:35,739] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 262 [default3]:[2022-09-03 19:20:35,763] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 235 [default3]:[2022-09-03 19:20:35,832] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 67 [default6]:[2022-09-03 19:20:35,941] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 230 [default7]:[2022-09-03 19:20:35,934] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 231 [default4]:[2022-09-03 19:20:35,941] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 220 [default7]:[2022-09-03 19:20:36,164] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 31 [default7]:[2022-09-03 19:20:36,105] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 207 [default5]:[2022-09-03 19:20:36,096] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 261 [default4]:[2022-09-03 19:20:36,096] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 260 [default7]:[2022-09-03 19:20:36,248] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 95 [default6]:[2022-09-03 19:20:36,377] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 54 [default3]:[2022-09-03 19:20:36,317] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 99 [default7]:[2022-09-03 19:20:36,353] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 223 [default2]:[2022-09-03 19:20:36,307] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 98 [default7]:[2022-09-03 19:20:36,456] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 55 [default6]:[2022-09-03 19:20:36,424] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 78 [default5]:[2022-09-03 19:20:36,540] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 253 [default4]:[2022-09-03 19:20:36,541] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 252 [default5]:[2022-09-03 19:20:36,636] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 165 [default3]:[2022-09-03 19:20:36,631] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 43 [default3]:[2022-09-03 19:20:36,650] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 203 [default7]:[2022-09-03 19:20:36,673] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 103 [default6]:[2022-09-03 19:20:36,669] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 102 [default7]:[2022-09-03 19:20:36,696] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 239 [default3]:[2022-09-03 19:20:36,742] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 107 [default1]:[2022-09-03 19:20:36,751] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 281 [default3]:[2022-09-03 19:20:36,728] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 275 [default3]:[2022-09-03 19:20:36,803] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 227 [default2]:[2022-09-03 19:20:36,859] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 226 [default0]:[2022-09-03 19:20:36,840] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 280 [default7]:[2022-09-03 19:20:36,834] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 279 [default7]:[2022-09-03 19:20:36,941] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 15 [default6]:[2022-09-03 19:20:36,932] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 46 [default7]:[2022-09-03 19:20:36,943] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 127 [default7]:[2022-09-03 19:20:36,950] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 199 [default6]:[2022-09-03 19:20:36,945] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 198 [default7]:[2022-09-03 19:20:36,918] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 47 [default3]:[2022-09-03 19:20:37,054] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 11 [default4]:[2022-09-03 19:20:37,040] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 60 [default5]:[2022-09-03 19:20:37,037] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 61 [default7]:[2022-09-03 19:20:37,077] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 151 [default6]:[2022-09-03 19:20:37,079] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 150 [default1]:[2022-09-03 19:20:37,079] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 161 [default5]:[2022-09-03 19:20:37,164] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 245 [default4]:[2022-09-03 19:20:37,155] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 244 [default7]:[2022-09-03 19:20:37,142] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 159 [default6]:[2022-09-03 19:20:37,140] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 158 [default2]:[2022-09-03 19:20:37,177] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 170 [default1]:[2022-09-03 19:20:37,235] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 241 [default2]:[2022-09-03 19:20:37,236] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 90 [default3]:[2022-09-03 19:20:37,246] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 91 [default3]:[2022-09-03 19:20:37,230] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 35 [default2]:[2022-09-03 19:20:37,238] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 34 [default4]:[2022-09-03 19:20:37,289] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 100 [default5]:[2022-09-03 19:20:37,293] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 101 [default1]:[2022-09-03 19:20:37,290] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 217 [default6]:[2022-09-03 19:20:37,376] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 254 [default1]:[2022-09-03 19:20:37,384] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 89 [default1]:[2022-09-03 19:20:37,302] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 81 [default0]:[2022-09-03 19:20:37,298] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 80 [default6]:[2022-09-03 19:20:37,300] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 38 [default2]:[2022-09-03 19:20:37,317] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 258 [default1]:[2022-09-03 19:20:37,411] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 225 [default0]:[2022-09-03 19:20:37,405] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 224 [default2]:[2022-09-03 19:20:37,412] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 146 [default1]:[2022-09-03 19:20:37,438] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 129 [default0]:[2022-09-03 19:20:37,454] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 128 [default3]:[2022-09-03 19:20:37,410] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 147 [default0]:[2022-09-03 19:20:37,446] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 88 [default3]:[2022-09-03 19:20:37,433] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 179 [default7]:[2022-09-03 19:20:37,446] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 143 [default3]:[2022-09-03 19:20:37,400] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 267 [default1]:[2022-09-03 19:20:37,431] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 169 [default6]:[2022-09-03 19:20:37,459] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 238 [default0]:[2022-09-03 19:20:37,435] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 168 [default5]:[2022-09-03 19:20:37,478] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 221 [default0]:[2022-09-03 19:20:37,579] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 240 [default0]:[2022-09-03 19:20:37,517] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 72 [default7]:[2022-09-03 19:20:37,563] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 87 [default2]:[2022-09-03 19:20:37,552] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 130 [default3]:[2022-09-03 19:20:37,554] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 131 [default7]:[2022-09-03 19:20:37,545] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 39 [default2]:[2022-09-03 19:20:37,559] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 82 [default6]:[2022-09-03 19:20:37,678] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 246 [default7]:[2022-09-03 19:20:37,592] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 111 [default3]:[2022-09-03 19:20:37,641] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 139 [default6]:[2022-09-03 19:20:37,644] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 182 [default3]:[2022-09-03 19:20:37,606] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 59 [default2]:[2022-09-03 19:20:37,612] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 58 [default7]:[2022-09-03 19:20:37,652] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 71 [default5]:[2022-09-03 19:20:37,637] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 181 [default7]:[2022-09-03 19:20:37,649] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 183 [default7]:[2022-09-03 19:20:37,611] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 7 [default6]:[2022-09-03 19:20:37,694] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 6 [default5]:[2022-09-03 19:20:37,702] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 13 [default7]:[2022-09-03 19:20:37,687] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 247 [default7]:[2022-09-03 19:20:37,772] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 119 [default5]:[2022-09-03 19:20:37,734] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 157 [default4]:[2022-09-03 19:20:37,736] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 156 [default6]:[2022-09-03 19:20:37,882] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 86 [default2]:[2022-09-03 19:20:37,812] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 194 [default2]:[2022-09-03 19:20:37,813] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 202 [default5]:[2022-09-03 19:20:37,884] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 269 [default4]:[2022-09-03 19:20:37,895] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 268 [default0]:[2022-09-03 19:20:37,850] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 232 [default1]:[2022-09-03 19:20:37,847] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 233 [default2]:[2022-09-03 19:20:37,883] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 210 [default6]:[2022-09-03 19:20:37,929] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 214 [default4]:[2022-09-03 19:20:37,920] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 28 [default3]:[2022-09-03 19:20:37,885] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 75 [default1]:[2022-09-03 19:20:37,888] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 73 [default7]:[2022-09-03 19:20:37,934] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 215 [default5]:[2022-09-03 19:20:37,918] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 29 [default5]:[2022-09-03 19:20:37,982] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 125 [default5]:[2022-09-03 19:20:37,906] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 133 [default4]:[2022-09-03 19:20:37,901] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 132 [default4]:[2022-09-03 19:20:37,970] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 124 [default2]:[2022-09-03 19:20:37,965] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 234 [default3]:[2022-09-03 19:20:38,012] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 115 [default5]:[2022-09-03 19:20:37,985] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 77 [default4]:[2022-09-03 19:20:38,079] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 76 [default6]:[2022-09-03 19:20:38,020] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 94 [default1]:[2022-09-03 19:20:38,054] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 33 [default0]:[2022-09-03 19:20:38,055] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 32 [default4]:[2022-09-03 19:20:38,007] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 148 [default5]:[2022-09-03 19:20:38,013] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 149 [default2]:[2022-09-03 19:20:38,059] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 274 [default6]:[2022-09-03 19:20:38,026] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 62 [default5]:[2022-09-03 19:20:38,063] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 237 [default4]:[2022-09-03 19:20:38,056] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 236 [default2]:[2022-09-03 19:20:38,156] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 74 [default0]:[2022-09-03 19:20:38,092] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 144 [default1]:[2022-09-03 19:20:38,095] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 145 [default0]:[2022-09-03 19:20:38,090] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 96 [default2]:[2022-09-03 19:20:38,118] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 178 [default1]:[2022-09-03 19:20:38,092] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 97 [default0]:[2022-09-03 19:20:38,162] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 256 [default1]:[2022-09-03 19:20:38,156] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 257 [default4]:[2022-09-03 19:20:38,151] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 276 [default1]:[2022-09-03 19:20:38,142] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 265 [default0]:[2022-09-03 19:20:38,160] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 264 [default5]:[2022-09-03 19:20:38,172] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 277 [default5]:[2022-09-03 19:20:38,213] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 229 [default0]:[2022-09-03 19:20:38,282] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 48 [default0]:[2022-09-03 19:20:38,227] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 192 [default4]:[2022-09-03 19:20:38,254] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 52 [default5]:[2022-09-03 19:20:38,259] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 53 [default4]:[2022-09-03 19:20:38,282] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 180 [default6]:[2022-09-03 19:20:38,242] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 142 [default5]:[2022-09-03 19:20:38,284] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 37 [default4]:[2022-09-03 19:20:38,289] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 36 [default6]:[2022-09-03 19:20:38,294] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 278 [default4]:[2022-09-03 19:20:38,340] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 116 [default1]:[2022-09-03 19:20:38,314] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 177 [default1]:[2022-09-03 19:20:38,344] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 137 [default3]:[2022-09-03 19:20:38,335] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 123 [default0]:[2022-09-03 19:20:38,312] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 176 [default2]:[2022-09-03 19:20:38,334] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 266 [default2]:[2022-09-03 19:20:38,447] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 114 [default3]:[2022-09-03 19:20:38,474] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 211 [default1]:[2022-09-03 19:20:38,476] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 153 [default0]:[2022-09-03 19:20:38,455] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 200 [default1]:[2022-09-03 19:20:38,451] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 201 [default1]:[2022-09-03 19:20:38,483] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 49 [default4]:[2022-09-03 19:20:38,411] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 92 [default5]:[2022-09-03 19:20:38,389] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 85 [default4]:[2022-09-03 19:20:38,390] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 84 [default5]:[2022-09-03 19:20:38,412] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 93 [default1]:[2022-09-03 19:20:38,439] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 273 [default2]:[2022-09-03 19:20:38,401] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 66 [default0]:[2022-09-03 19:20:38,536] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 8 [default2]:[2022-09-03 19:20:38,577] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 10 [default1]:[2022-09-03 19:20:38,533] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 9 [default6]:[2022-09-03 19:20:38,492] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 14 [default2]:[2022-09-03 19:20:38,549] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 26 [default3]:[2022-09-03 19:20:38,522] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 27 [default2]:[2022-09-03 19:20:38,532] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 106 [default0]:[2022-09-03 19:20:38,508] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 104 [default0]:[2022-09-03 19:20:38,560] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 56 [default5]:[2022-09-03 19:20:38,494] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 45 [default2]:[2022-09-03 19:20:38,495] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 42 [default1]:[2022-09-03 19:20:38,574] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 121 [default0]:[2022-09-03 19:20:38,570] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 120 [default6]:[2022-09-03 19:20:38,495] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 126 [default6]:[2022-09-03 19:20:38,500] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 206 [default2]:[2022-09-03 19:20:38,587] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 138 [default5]:[2022-09-03 19:20:38,546] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 205 [default1]:[2022-09-03 19:20:38,510] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 65 [default5]:[2022-09-03 19:20:38,532] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 69 [default6]:[2022-09-03 19:20:38,568] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 70 [default1]:[2022-09-03 19:20:38,560] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 57 [default4]:[2022-09-03 19:20:38,544] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 68 [default6]:[2022-09-03 19:20:38,584] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 30 [default1]:[2022-09-03 19:20:38,673] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 41 [default2]:[2022-09-03 19:20:38,672] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 122 [default4]:[2022-09-03 19:20:38,681] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 140 [default5]:[2022-09-03 19:20:38,676] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 141 [default5]:[2022-09-03 19:20:38,649] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 5 [default1]:[2022-09-03 19:20:38,708] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 25 [default0]:[2022-09-03 19:20:38,718] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 24 [default6]:[2022-09-03 19:20:38,729] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 118 [default6]:[2022-09-03 19:20:38,693] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 110 [default5]:[2022-09-03 19:20:38,743] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 109 [default4]:[2022-09-03 19:20:38,750] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 108 [default0]:[2022-09-03 19:20:38,868] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 208 [default4]:[2022-09-03 19:20:38,807] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 12 [default4]:[2022-09-03 19:20:38,823] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 196 [default5]:[2022-09-03 19:20:38,824] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 197 [default0]:[2022-09-03 19:20:38,824] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 272 [default1]:[2022-09-03 19:20:38,883] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 209 [default5]:[2022-09-03 19:20:38,908] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 213 [default1]:[2022-09-03 19:20:38,973] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 193 [default3]:[2022-09-03 19:20:38,929] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 187 [default0]:[2022-09-03 19:20:39,032] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 112 [default5]:[2022-09-03 19:20:39,080] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 117 [default1]:[2022-09-03 19:20:38,998] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 105 [default5]:[2022-09-03 19:20:39,023] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 189 [default6]:[2022-09-03 19:20:39,078] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 190 [default0]:[2022-09-03 19:20:39,058] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 64 [default0]:[2022-09-03 19:20:39,117] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 136 [default4]:[2022-09-03 19:20:39,121] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 204 [default4]:[2022-09-03 19:20:39,232] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 228 [default4]:[2022-09-03 19:20:39,267] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 188 [default1]:[2022-09-03 19:20:39,305] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 113 [default0]:[2022-09-03 19:20:39,354] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 40 [default4]:[2022-09-03 19:20:39,372] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 44 [default0]:[2022-09-03 19:20:39,315] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 184 [default4]:[2022-09-03 19:20:39,384] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 4 [default4]:[2022-09-03 19:20:39,391] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 212 [default0]:[2022-09-03 19:20:39,459] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 152 [default7]:[2022-09-03 19:20:39,473] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 191 [default2]:[2022-09-03 19:20:39,502] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 186 [default1]:[2022-09-03 19:20:39,958] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 185 [default0]:[2022-09-03 19:20:43,252] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 0 [default0]:could not find arguments in the checkpoint ... [default0]: checkpoint version 3.0 [default6]:[2022-09-03 19:20:43,311] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 286 [default2]:[2022-09-03 19:20:44,239] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 2 [default5]:[2022-09-03 19:20:45,671] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 285 [default4]:[2022-09-03 19:20:46,750] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 284 [default7]:[2022-09-03 19:20:46,854] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 287 [default1]:[2022-09-03 19:20:47,153] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 1 [default7]:time (ms) | load-checkpoint: 27946.08 [default0]: successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq at iteration 0 [default3]:[2022-09-03 19:20:49,956] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 3 [default0]:/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/utils.py:365: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings [default0]: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") [default0]:estimated model parameters: 258.958393344 [default0]:estimated model parameters without embeddings: 0.002064384 [default0]:[after model, optimizer, and learning rate scheduler are built] datetime: 2022-09-03 19:20:50 [default0]:> building train, validation, and test datasets ... [default0]: > datasets target sizes (minimum size): [default0]: train: 6348800 [default0]: validation: 26624 [default0]: test: 2048 [default0]:> building train, validation, and test datasets for T0 ... [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.073376 seconds [default0]: number of documents: 90897616 [default0]: > dataset split: [default0]: train: [default0]: document indices in [0, 90897616) total of 90897616 documents [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.023701 seconds [default0]: number of documents: 90897616 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.002349 seconds [default0]: number of documents: 90897616 [default0]: > loading doc-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_batch_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_shuffle_idx.npy [default0]: loaded indexed file in 0.045 seconds [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.070114 seconds [default0]: number of documents: 15234080 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [14472376, 15234080) total of 761704 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.023145 [default0]: using: [default0]: number of documents: 761704 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 221749 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.009031 [default0]: > building shuffle index with split [0, 221749) and [221749, 221749) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.007578 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_885ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_885ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_885ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.006 seconds [default0]: total number of samples: 221750 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.061704 seconds [default0]: number of documents: 6142390 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [5835270, 6142390) total of 307120 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.012006 [default0]: using: [default0]: number of documents: 307120 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 136142 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.004022 [default0]: > building shuffle index with split [0, 136142) and [136142, 136142) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.004680 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_301ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_301ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_301ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.004 seconds [default0]: total number of samples: 136143 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.109275 seconds [default0]: number of documents: 26176998 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [24868148, 26176998) total of 1308850 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.033932 [default0]: using: [default0]: number of documents: 1308850 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 432310 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.011050 [default0]: > building shuffle index with split [0, 432310) and [432310, 432310) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.010164 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_3486ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_3486ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_3486ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.321 seconds [default0]: total number of samples: 432311 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.004367 seconds [default0]: number of documents: 20844665 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [19802432, 20844665) total of 1042233 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.025391 [default0]: using: [default0]: number of documents: 1042233 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 521544 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.012749 [default0]: > building shuffle index with split [0, 521544) and [521544, 521544) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.011185 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_5933ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_5933ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_5933ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.093 seconds [default0]: total number of samples: 521545 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.061699 seconds [default0]: number of documents: 67005817 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [63655526, 67005817) total of 3350291 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.088874 [default0]: using: [default0]: number of documents: 3350291 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 1740320 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.031995 [default0]: > building shuffle index with split [0, 1740320) and [1740320, 1740320) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.035224 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_2855ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_2855ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_2855ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.171 seconds [default0]: total number of samples: 1740321 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003233 seconds [default0]: number of documents: 5149795 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4892305, 5149795) total of 257490 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.010790 [default0]: using: [default0]: number of documents: 257490 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 26369 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003548 [default0]: > building shuffle index with split [0, 26369) and [26369, 26369) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.002325 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_42ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_42ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_42ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.004 seconds [default0]: total number of samples: 26370 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.070313 seconds [default0]: number of documents: 58847091 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [55904736, 58847091) total of 2942355 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.077801 [default0]: using: [default0]: number of documents: 2942355 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 1458653 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.025864 [default0]: > building shuffle index with split [0, 1458653) and [1458653, 1458653) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.030627 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_3493ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_3493ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_3493ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.173 seconds [default0]: total number of samples: 1458654 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.004030 seconds [default0]: number of documents: 12514253 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11888540, 12514253) total of 625713 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.017773 [default0]: using: [default0]: number of documents: 625713 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 134070 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.005771 [default0]: > building shuffle index with split [0, 134070) and [134070, 134070) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.004890 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_293ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_293ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_293ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.004 seconds [default0]: total number of samples: 134071 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.049936 seconds [default0]: number of documents: 180608 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [171578, 180608) total of 9030 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.004575 [default0]: using: [default0]: number of documents: 9030 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 2500 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.002129 [default0]: > building shuffle index with split [0, 2500) and [2500, 2500) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.001906 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_3ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_3ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_3ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.002 seconds [default0]: total number of samples: 2501 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.097003 seconds [default0]: number of documents: 12303134 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11687977, 12303134) total of 615157 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.018446 [default0]: using: [default0]: number of documents: 615157 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 157243 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.005989 [default0]: > building shuffle index with split [0, 157243) and [157243, 157243) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.006086 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_147ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_147ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_147ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.005 seconds [default0]: total number of samples: 157244 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.094863 seconds [default0]: number of documents: 2033057 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1931404, 2033057) total of 101653 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.006479 [default0]: using: [default0]: number of documents: 101653 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 20516 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.002781 [default0]: > building shuffle index with split [0, 20516) and [20516, 20516) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.002940 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_11ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_11ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_11ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 20517 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.136949 seconds [default0]: number of documents: 26793553 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [25453875, 26793553) total of 1339678 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.033566 [default0]: using: [default0]: number of documents: 1339678 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 101501 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.007937 [default0]: > building shuffle index with split [0, 101501) and [101501, 101501) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.004781 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_200ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_200ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_200ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.005 seconds [default0]: total number of samples: 101502 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.073526 seconds [default0]: number of documents: 3155990 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2998190, 3155990) total of 157800 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.007548 [default0]: using: [default0]: number of documents: 157800 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 44181 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003218 [default0]: > building shuffle index with split [0, 44181) and [44181, 44181) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003480 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_17ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_17ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_17ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.032 seconds [default0]: total number of samples: 44182 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.051988 seconds [default0]: number of documents: 6692522 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [6357896, 6692522) total of 334626 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.010852 [default0]: using: [default0]: number of documents: 334626 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 47612 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.006505 [default0]: > building shuffle index with split [0, 47612) and [47612, 47612) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003679 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_28ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_28ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_28ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 47613 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.066784 seconds [default0]: number of documents: 3017261 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2866398, 3017261) total of 150863 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.007867 [default0]: using: [default0]: number of documents: 150863 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 29297 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.002972 [default0]: > building shuffle index with split [0, 29297) and [29297, 29297) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003090 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 29298 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.084139 seconds [default0]: number of documents: 3648041 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [3465639, 3648041) total of 182402 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.008036 [default0]: using: [default0]: number of documents: 182402 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 5658 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003649 [default0]: > building shuffle index with split [0, 5658) and [5658, 5658) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.002745 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_18ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_18ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_18ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.002 seconds [default0]: total number of samples: 5659 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.057478 seconds [default0]: number of documents: 4327282 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4110918, 4327282) total of 216364 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.009983 [default0]: using: [default0]: number of documents: 216364 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 12422 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.005141 [default0]: > building shuffle index with split [0, 12422) and [12422, 12422) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003450 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_10ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_10ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_10ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 12423 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.126163 seconds [default0]: number of documents: 2698896 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2563951, 2698896) total of 134945 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.006520 [default0]: using: [default0]: number of documents: 134945 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 19132 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003168 [default0]: > building shuffle index with split [0, 19132) and [19132, 19132) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.002956 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 19133 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.079935 seconds [default0]: number of documents: 12767593 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [12129213, 12767593) total of 638380 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.016953 [default0]: using: [default0]: number of documents: 638380 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 87927 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.007320 [default0]: > building shuffle index with split [0, 87927) and [87927, 87927) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.004622 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_57ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_57ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_57ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.004 seconds [default0]: total number of samples: 87928 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.099889 seconds [default0]: number of documents: 4342323 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4125207, 4342323) total of 217116 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.010185 [default0]: using: [default0]: number of documents: 217116 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 69779 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.004809 [default0]: > building shuffle index with split [0, 69779) and [69779, 69779) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003860 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_25ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_25ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_25ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.032 seconds [default0]: total number of samples: 69780 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.037849 seconds [default0]: number of documents: 3022722 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2871586, 3022722) total of 151136 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.008435 [default0]: using: [default0]: number of documents: 151136 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 22531 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003011 [default0]: > building shuffle index with split [0, 22531) and [22531, 22531) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.002690 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_34ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_34ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_34ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.002 seconds [default0]: total number of samples: 22532 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.068468 seconds [default0]: number of documents: 1162568 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1104440, 1162568) total of 58128 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.008751 [default0]: using: [default0]: number of documents: 58128 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 1607 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.004747 [default0]: > building shuffle index with split [0, 1607) and [1607, 1607) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.002614 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_9ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_9ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_9ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.002 seconds [default0]: total number of samples: 1608 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.072399 seconds [default0]: number of documents: 55294645 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [52529913, 55294645) total of 2764732 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.073575 [default0]: using: [default0]: number of documents: 2764732 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 690620 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.020033 [default0]: > building shuffle index with split [0, 690620) and [690620, 690620) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.016875 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_2178ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_2178ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_2178ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.098 seconds [default0]: total number of samples: 690621 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.060615 seconds [default0]: number of documents: 44855616 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [42612835, 44855616) total of 2242781 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.057417 [default0]: using: [default0]: number of documents: 2242781 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 468688 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.015001 [default0]: > building shuffle index with split [0, 468688) and [468688, 468688) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.010441 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_1480ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_1480ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_1480ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.211 seconds [default0]: total number of samples: 468689 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.154413 seconds [default0]: number of documents: 31969891 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [30371396, 31969891) total of 1598495 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.040766 [default0]: using: [default0]: number of documents: 1598495 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 497624 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.012644 [default0]: > building shuffle index with split [0, 497624) and [497624, 497624) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.012167 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_1326ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_1326ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_1326ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.006 seconds [default0]: total number of samples: 497625 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.082485 seconds [default0]: number of documents: 34110375 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [32404856, 34110375) total of 1705519 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.043762 [default0]: using: [default0]: number of documents: 1705519 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 125119 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.009217 [default0]: > building shuffle index with split [0, 125119) and [125119, 125119) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.005222 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_659ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_659ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_659ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.137 seconds [default0]: total number of samples: 125120 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.020810 seconds [default0]: number of documents: 43761623 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [41573542, 43761623) total of 2188081 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.057873 [default0]: using: [default0]: number of documents: 2188081 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 1010591 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.021200 [default0]: > building shuffle index with split [0, 1010591) and [1010591, 1010591) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.020610 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_3236ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_3236ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_3236ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.101 seconds [default0]: total number of samples: 1010592 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.042073 seconds [default0]: number of documents: 197602 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [187722, 197602) total of 9880 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.005847 [default0]: using: [default0]: number of documents: 9880 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 4450 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.002449 [default0]: > building shuffle index with split [0, 4450) and [4450, 4450) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.002666 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 4451 [default0]: total number of epochs: 1 [default0]:> building indices for blendable datasets ... [default0]: > sample ratios: [default0]: dataset 0, input: 0.0330676, achieved: 0.0330676 [default0]: dataset 1, input: 0.0112421, achieved: 0.0112421 [default0]: dataset 2, input: 0.130272, achieved: 0.130272 [default0]: dataset 3, input: 0.221712, achieved: 0.221712 [default0]: dataset 4, input: 0.106678, achieved: 0.106678 [default0]: dataset 5, input: 0.00155951, achieved: 0.00155955 [default0]: dataset 6, input: 0.13054, achieved: 0.13054 [default0]: dataset 7, input: 0.010918, achieved: 0.0109181 [default0]: dataset 8, input: 0.000110214, achieved: 0.000110257 [default0]: dataset 9, input: 0.00549238, achieved: 0.00549235 [default0]: dataset 10, input: 0.000402122, achieved: 0.000402094 [default0]: dataset 11, input: 0.00747007, achieved: 0.00747007 [default0]: dataset 12, input: 0.000619047, achieved: 0.000619024 [default0]: dataset 13, input: 0.00103353, achieved: 0.0010336 [default0]: dataset 14, input: 0.000501201, achieved: 0.000501226 [default0]: dataset 15, input: 0.000667277, achieved: 0.000667231 [default0]: dataset 16, input: 0.000359281, achieved: 0.000359326 [default0]: dataset 17, input: 0.000508443, achieved: 0.000508519 [default0]: dataset 18, input: 0.00211373, achieved: 0.0021138 [default0]: dataset 19, input: 0.000912995, achieved: 0.000912961 [default0]: dataset 20, input: 0.00124543, achieved: 0.00124546 [default0]: dataset 21, input: 0.000315887, achieved: 0.00031594 [default0]: dataset 22, input: 0.0813721, achieved: 0.0813721 [default0]: dataset 23, input: 0.0552939, achieved: 0.0552939 [default0]: dataset 24, input: 0.0495415, achieved: 0.0495414 [default0]: dataset 25, input: 0.0246164, achieved: 0.0246163 [default0]: dataset 26, input: 0.120917, achieved: 0.120917 [default0]: dataset 27, input: 0.000517703, achieved: 0.000517666 [default0]:> elapsed time for building blendable dataset indices: 0.32 (sec) [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.011626 seconds [default0]: number of documents: 15234080 [default0]: > dataset split: [default0]: valid_ar: [default0]: document indices in [14472376, 15234080) total of 761704 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.019219 [default0]: using: [default0]: number of documents: 761704 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 221749 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.007138 [default0]: > building shuffle index with split [0, 221749) and [221749, 221749) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.008055 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_valid_ar_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_valid_ar_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_valid_ar_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.004 seconds [default0]: total number of samples: 221750 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003727 seconds [default0]: number of documents: 6142390 [default0]: > dataset split: [default0]: valid_ca: [default0]: document indices in [5835270, 6142390) total of 307120 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.009523 [default0]: using: [default0]: number of documents: 307120 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 136142 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.005169 [default0]: > building shuffle index with split [0, 136142) and [136142, 136142) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.005888 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_valid_ca_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_valid_ca_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_valid_ca_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.050 seconds [default0]: total number of samples: 136143 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.009432 seconds [default0]: number of documents: 26176998 [default0]: > dataset split: [default0]: valid_code: [default0]: document indices in [24868148, 26176998) total of 1308850 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.031256 [default0]: using: [default0]: number of documents: 1308850 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 432310 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.011675 [default0]: > building shuffle index with split [0, 432310) and [432310, 432310) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.010848 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_valid_code_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_valid_code_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_valid_code_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.006 seconds [default0]: total number of samples: 432311 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003708 seconds [default0]: number of documents: 20844665 [default0]: > dataset split: [default0]: valid_en: [default0]: document indices in [19802432, 20844665) total of 1042233 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.025001 [default0]: using: [default0]: number of documents: 1042233 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 521544 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.012206 [default0]: > building shuffle index with split [0, 521544) and [521544, 521544) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.011523 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_valid_en_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_valid_en_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_valid_en_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.295 seconds [default0]: total number of samples: 521545 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.004232 seconds [default0]: number of documents: 67005817 [default0]: > dataset split: [default0]: valid_es: [default0]: document indices in [63655526, 67005817) total of 3350291 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.081355 [default0]: using: [default0]: number of documents: 3350291 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 1740320 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.031946 [default0]: > building shuffle index with split [0, 1740320) and [1740320, 1740320) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.035767 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_valid_es_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_valid_es_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_valid_es_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.099 seconds [default0]: total number of samples: 1740321 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.005425 seconds [default0]: number of documents: 5149795 [default0]: > dataset split: [default0]: valid_eu: [default0]: document indices in [4892305, 5149795) total of 257490 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > last epoch number of samples (255) is smaller than 95.0% of number of samples per epoch (26369), setting separate_last_epoch to True [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.013848 [default0]: using: [default0]: number of documents: 257490 [default0]: number of epochs: 2 [default0]: sequence length: 2048 [default0]: total number of samples: 52738 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.004444 [default0]: > building shuffle index with split [0, 26369) and [26369, 52738) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.008972 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_valid_eu_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_valid_eu_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_valid_eu_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 52739 [default0]: total number of epochs: 2 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.012105 seconds [default0]: number of documents: 58847091 [default0]: > dataset split: [default0]: valid_fr: [default0]: document indices in [55904736, 58847091) total of 2942355 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.082339 [default0]: using: [default0]: number of documents: 2942355 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 1458653 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.025362 [default0]: > building shuffle index with split [0, 1458653) and [1458653, 1458653) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.030950 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_valid_fr_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_valid_fr_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_valid_fr_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.112 seconds [default0]: total number of samples: 1458654 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.004369 seconds [default0]: number of documents: 12514253 [default0]: > dataset split: [default0]: valid_id: [default0]: document indices in [11888540, 12514253) total of 625713 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.016683 [default0]: using: [default0]: number of documents: 625713 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 134070 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.006031 [default0]: > building shuffle index with split [0, 134070) and [134070, 134070) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.005082 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_valid_id_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_valid_id_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_valid_id_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.004 seconds [default0]: total number of samples: 134071 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.001778 seconds [default0]: number of documents: 180608 [default0]: > dataset split: [default0]: valid_indic-as: [default0]: document indices in [171578, 180608) total of 9030 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > last epoch number of samples (1622) is smaller than 95.0% of number of samples per epoch (2500), setting separate_last_epoch to True [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.005350 [default0]: using: [default0]: number of documents: 9030 [default0]: number of epochs: 11 [default0]: sequence length: 2048 [default0]: total number of samples: 27503 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003190 [default0]: > building shuffle index with split [0, 25002) and [25002, 27503) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003881 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_valid_indic-as_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_valid_indic-as_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_valid_indic-as_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 27504 [default0]: total number of epochs: 11 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.009217 seconds [default0]: number of documents: 12303134 [default0]: > dataset split: [default0]: valid_indic-bn: [default0]: document indices in [11687977, 12303134) total of 615157 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.016769 [default0]: using: [default0]: number of documents: 615157 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 157243 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.006598 [default0]: > building shuffle index with split [0, 157243) and [157243, 157243) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.006438 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_valid_indic-bn_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_valid_indic-bn_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_valid_indic-bn_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.004 seconds [default0]: total number of samples: 157244 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003277 seconds [default0]: number of documents: 2033057 [default0]: > dataset split: [default0]: valid_indic-gu: [default0]: document indices in [1931404, 2033057) total of 101653 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > last epoch number of samples (6108) is smaller than 95.0% of number of samples per epoch (20516), setting separate_last_epoch to True [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.007372 [default0]: using: [default0]: number of documents: 101653 [default0]: number of epochs: 2 [default0]: sequence length: 2048 [default0]: total number of samples: 41033 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003541 [default0]: > building shuffle index with split [0, 20516) and [20516, 41033) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003385 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_valid_indic-gu_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_valid_indic-gu_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_valid_indic-gu_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 41034 [default0]: total number of epochs: 2 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.004484 seconds [default0]: number of documents: 26793553 [default0]: > dataset split: [default0]: valid_indic-hi: [default0]: document indices in [25453875, 26793553) total of 1339678 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.032356 [default0]: using: [default0]: number of documents: 1339678 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 101501 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.008011 [default0]: > building shuffle index with split [0, 101501) and [101501, 101501) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.005504 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_valid_indic-hi_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_valid_indic-hi_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_valid_indic-hi_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.004 seconds [default0]: total number of samples: 101502 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003010 seconds [default0]: number of documents: 3155990 [default0]: > dataset split: [default0]: valid_indic-kn: [default0]: document indices in [2998190, 3155990) total of 157800 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.006807 [default0]: using: [default0]: number of documents: 157800 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 44181 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.005295 [default0]: > building shuffle index with split [0, 44181) and [44181, 44181) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003868 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_valid_indic-kn_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_valid_indic-kn_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_valid_indic-kn_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 44182 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.011882 seconds [default0]: number of documents: 6692522 [default0]: > dataset split: [default0]: valid_indic-ml: [default0]: document indices in [6357896, 6692522) total of 334626 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.009555 [default0]: using: [default0]: number of documents: 334626 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 47612 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.004708 [default0]: > building shuffle index with split [0, 47612) and [47612, 47612) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.005009 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_valid_indic-ml_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_valid_indic-ml_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_valid_indic-ml_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 47613 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.012493 seconds [default0]: number of documents: 3017261 [default0]: > dataset split: [default0]: valid_indic-mr: [default0]: document indices in [2866398, 3017261) total of 150863 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.007048 [default0]: using: [default0]: number of documents: 150863 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 29297 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003741 [default0]: > building shuffle index with split [0, 29297) and [29297, 29297) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.004406 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_valid_indic-mr_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_valid_indic-mr_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_valid_indic-mr_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 29298 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.005049 seconds [default0]: number of documents: 3648041 [default0]: > dataset split: [default0]: valid_indic-ne: [default0]: document indices in [3465639, 3648041) total of 182402 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > last epoch number of samples (3989) is smaller than 95.0% of number of samples per epoch (5658), setting separate_last_epoch to True [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.022136 [default0]: using: [default0]: number of documents: 182402 [default0]: number of epochs: 5 [default0]: sequence length: 2048 [default0]: total number of samples: 28294 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.005497 [default0]: > building shuffle index with split [0, 22635) and [22635, 28294) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003313 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_valid_indic-ne_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_valid_indic-ne_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_valid_indic-ne_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 28295 [default0]: total number of epochs: 5 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003670 seconds [default0]: number of documents: 4327282 [default0]: > dataset split: [default0]: valid_indic-or: [default0]: document indices in [4110918, 4327282) total of 216364 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > last epoch number of samples (1779) is smaller than 95.0% of number of samples per epoch (12422), setting separate_last_epoch to True [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.017880 [default0]: using: [default0]: number of documents: 216364 [default0]: number of epochs: 3 [default0]: sequence length: 2048 [default0]: total number of samples: 37267 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.004657 [default0]: > building shuffle index with split [0, 24845) and [24845, 37267) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.004050 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_valid_indic-or_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_valid_indic-or_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_valid_indic-or_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 37268 [default0]: total number of epochs: 3 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.004296 seconds [default0]: number of documents: 2698896 [default0]: > dataset split: [default0]: valid_indic-pa: [default0]: document indices in [2563951, 2698896) total of 134945 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > last epoch number of samples (7492) is smaller than 95.0% of number of samples per epoch (19132), setting separate_last_epoch to True [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.008264 [default0]: using: [default0]: number of documents: 134945 [default0]: number of epochs: 2 [default0]: sequence length: 2048 [default0]: total number of samples: 38264 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003748 [default0]: > building shuffle index with split [0, 19132) and [19132, 38264) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003391 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_valid_indic-pa_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_valid_indic-pa_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_valid_indic-pa_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.041 seconds [default0]: total number of samples: 38265 [default0]: total number of epochs: 2 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.010025 seconds [default0]: number of documents: 12767593 [default0]: > dataset split: [default0]: valid_indic-ta: [default0]: document indices in [12129213, 12767593) total of 638380 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.015930 [default0]: using: [default0]: number of documents: 638380 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 87927 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.005661 [default0]: > building shuffle index with split [0, 87927) and [87927, 87927) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.005444 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_valid_indic-ta_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_valid_indic-ta_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_valid_indic-ta_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 87928 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.007728 seconds [default0]: number of documents: 4342323 [default0]: > dataset split: [default0]: valid_indic-te: [default0]: document indices in [4125207, 4342323) total of 217116 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > only one epoch required, setting separate_last_epoch to False [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.007642 [default0]: using: [default0]: number of documents: 217116 [default0]: number of epochs: 1 [default0]: sequence length: 2048 [default0]: total number of samples: 69779 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.004216 [default0]: > building shuffle index with split [0, 69779) and [69779, 69779) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.004507 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_valid_indic-te_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_valid_indic-te_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_valid_indic-te_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 69780 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.013415 seconds [default0]: number of documents: 3022722 [default0]: > dataset split: [default0]: valid_indic-ur: [default0]: document indices in [2871586, 3022722) total of 151136 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > last epoch number of samples (4093) is smaller than 95.0% of number of samples per epoch (22531), setting separate_last_epoch to True [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.009297 [default0]: using: [default0]: number of documents: 151136 [default0]: number of epochs: 2 [default0]: sequence length: 2048 [default0]: total number of samples: 45063 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.003934 [default0]: > building shuffle index with split [0, 22531) and [22531, 45063) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.005818 [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_valid_indic-ur_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_valid_indic-ur_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_valid_indic-ur_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 45064 [default0]: total number of epochs: 2 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003925 seconds [default0]: number of documents: 1162568 [default0]: > dataset split: [default0]: valid_nigercongo-all: [default0]: document indices in [1104440, 1162568) total of 58128 documents [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]: > last epoch number of samples (907) is smaller than 95.0% of number of samples per epoch (1607), setting separate_last_epoch to True [default0]: > elasped time to build and save doc-idx mapping (seconds): 0.024280 [default0]: using: [default0]: number of documents: 58128 [default0]: number of epochs: 17 [default0]: sequence length: 2048 [default0]: total number of samples: 27325 [default0]: > elasped time to build and save sample-idx mapping (seconds): 0.004548 [default0]: > building shuffle index with split [0, 25717) and [25717, 27325) ... [default0]: > elasped time to build and save shuffle-idx mapping (seconds): 0.003520 [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_valid_nigercongo-all_indexmap_26624ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_valid_nigercongo-all_indexmap_26624ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_valid_nigercongo-all_indexmap_26624ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.004 seconds [default0]: total number of samples: 27326 [default0]: total number of epochs: 17 [default0]: > building dataset index ... [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: > finished creating indexed dataset in 0.000037 seconds [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: indexed_dataset.sizes.shape[0])) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]:Traceback (most recent call last): [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default3]: d = build_dataset_group_gpt( [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:Traceback (most recent call last): [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: return f(*args, **kwargs) [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: indexed_dataset.sizes.shape[0])) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default3]: pretrain( [default0]: main() [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: d = build_dataset_group_gpt( [default2]: main() [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]: dataset = _build_single_datasets(paths[0], [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: main() [default3]: indexed_dataset.sizes.shape[0])) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: return f(*args, **kwargs) [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]:Traceback (most recent call last): [default2]: return f(*args, **kwargs) [default0]: pretrain( [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: pretrain( [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: main() [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: indexed_dataset.sizes.shape[0])) [default2]: d = build_dataset_group_gpt( [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default1]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]: pretrain( [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]:Traceback (most recent call last): [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: indexed_dataset.sizes.shape[0])) [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: dataset = _build_single_datasets(paths[0], [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: dataset = _build_single_datasets(paths[0], [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]:Traceback (most recent call last): [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default1]:Traceback (most recent call last): [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default4]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: return f(*args, **kwargs) [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: d = build_dataset_group_gpt( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: d = build_dataset_group_gpt( [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: pretrain( [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: dataset = _build_single_datasets(paths[0], [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default4]: main() [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]:Traceback (most recent call last): [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: indexed_dataset.sizes.shape[0])) [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: return f(*args, **kwargs) [default4]: indexed_dataset.sizes.shape[0])) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: main() [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default0]:Traceback (most recent call last): [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: pretrain( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: main() [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: main() [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default4]: dataset = _build_single_datasets(paths[0], [default4]:Traceback (most recent call last): [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: pretrain( [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]:Traceback (most recent call last): [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: pretrain( [default1]:Traceback (most recent call last): [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default2]: main() [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: d = build_dataset_group_gpt( [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: d = build_dataset_group_gpt( [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: main() [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: pretrain( [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default2]: main() [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: return f(*args, **kwargs) [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: pretrain( [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: main() [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default0]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default4]:Traceback (most recent call last): [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]: return f(*args, **kwargs) [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default3]: dataset = _build_single_datasets(paths[0], [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: d = build_dataset_group_gpt( [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]: return f(*args, **kwargs) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: dataset = _build_single_datasets(paths[0], [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Traceback (most recent call last): [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default4]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: return f(*args, **kwargs) [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: return f(*args, **kwargs) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: return f(*args, **kwargs) [default6]: main() [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: indexed_dataset.sizes.shape[0])) [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default6]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: d = build_dataset_group_gpt( [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default4]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default4]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default0]: d = build_dataset_group_gpt( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default0]: dataset = _build_single_datasets(paths[0], [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default0]: indexed_dataset = get_indexed_dataset_(data_prefix, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default0]: indexed_dataset.sizes.shape[0])) [default0]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: return f(*args, **kwargs) [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default1]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default4]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default4]: d = build_dataset_group_gpt( [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default4]: dataset = _build_single_datasets(paths[0], [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default4]: indexed_dataset.sizes.shape[0])) [default4]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default7]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: dataset = _build_single_datasets(paths[0], [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default0]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default0]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default3]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default1]:Traceback (most recent call last): [default2]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default2]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default1]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default1]: d = build_dataset_group_gpt( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default1]: dataset = _build_single_datasets(paths[0], [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default1]: indexed_dataset = get_indexed_dataset_(data_prefix, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default1]: indexed_dataset.sizes.shape[0])) [default1]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default5]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Dataset does not exist: /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document [default6]:Path should be a basename that both .idx and .bin can be appended to get full filenames. [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default3]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default3]: d = build_dataset_group_gpt( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default3]: dataset = _build_single_datasets(paths[0], [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default3]: indexed_dataset = get_indexed_dataset_(data_prefix, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default3]: indexed_dataset.sizes.shape[0])) [default3]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default6]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: d = build_dataset_group_gpt( [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default2]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default2]: d = build_dataset_group_gpt( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default2]: dataset = _build_single_datasets(paths[0], [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default2]: indexed_dataset = get_indexed_dataset_(data_prefix, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default2]: indexed_dataset.sizes.shape[0])) [default2]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default6]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default6]: d = build_dataset_group_gpt( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default6]: dataset = _build_single_datasets(paths[0], [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default6]: indexed_dataset = get_indexed_dataset_(data_prefix, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default6]: indexed_dataset.sizes.shape[0])) [default6]:AttributeError: 'NoneType' object has no attribute 'sizes' [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default7]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default7]: d = build_dataset_group_gpt( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default7]: dataset = _build_single_datasets(paths[0], [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default7]: indexed_dataset = get_indexed_dataset_(data_prefix, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default7]: indexed_dataset.sizes.shape[0])) [default7]:AttributeError: 'NoneType' object has no attribute 'sizes' [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators [default5]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider [default5]: d = build_dataset_group_gpt( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group [default5]: dataset = _build_single_datasets(paths[0], [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets [default5]: indexed_dataset = get_indexed_dataset_(data_prefix, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ [default5]: indexed_dataset.sizes.shape[0])) [default5]:AttributeError: 'NoneType' object has no attribute 'sizes' WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2983254 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 514600 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1444752 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1892794) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2671927) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1962601) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1375667) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 372264) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1780025) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1582084) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3609944) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2136952) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 515486) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 409961) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1802209) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3786102) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2229736) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 250396) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3154682) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3633705) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2639266) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3916199) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2018779) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1973333) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1553909) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3042223) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 929491) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1321263) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1716396) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3637157) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3022087 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 2983255) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2933079) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 514601) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1982357) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 421896) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 1444753) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3956402) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3594384) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 3022088) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) main() Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, main() exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main exec(code, run_globals) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main elastic_launch( return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, main() exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return _run_code(code, main_globals, None, run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ exec(code, run_globals) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper raise ChildFailedError( main() main() raise ChildFailedError( raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam05-ib0 rank : 26 (local_rank: 2) exitcode : 1 (pid: 3022089) error_file: /tmp/torchelastic_bb8khjm9/none_zmgk5wmb/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( main() elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam09-ib0 rank : 57 (local_rank: 1) exitcode : 1 (pid: 2018780) error_file: /tmp/torchelastic_4jr6sy1h/none_5hkzj4t6/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam27-ib0 rank : 121 (local_rank: 1) exitcode : 1 (pid: 250397) error_file: /tmp/torchelastic_etz59gvn/none_2c5k4fru/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam39-ib0 rank : 209 (local_rank: 1) exitcode : 1 (pid: 1375668) error_file: /tmp/torchelastic_7onnlf4_/none_i_nfp8dj/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam05-ib0 rank : 27 (local_rank: 3) exitcode : 1 (pid: 3022090) error_file: /tmp/torchelastic_bb8khjm9/none_zmgk5wmb/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam09-ib0 rank : 58 (local_rank: 2) exitcode : 1 (pid: 2018781) error_file: /tmp/torchelastic_4jr6sy1h/none_5hkzj4t6/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam27-ib0 rank : 122 (local_rank: 2) exitcode : 1 (pid: 250398) error_file: /tmp/torchelastic_etz59gvn/none_2c5k4fru/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( return f(*args, **kwargs) return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam39-ib0 rank : 210 (local_rank: 2) exitcode : 1 (pid: 1375669) error_file: /tmp/torchelastic_7onnlf4_/none_i_nfp8dj/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam05-ib0 rank : 28 (local_rank: 4) exitcode : 1 (pid: 3022091) error_file: /tmp/torchelastic_bb8khjm9/none_zmgk5wmb/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam09-ib0 rank : 59 (local_rank: 3) exitcode : 1 (pid: 2018782) error_file: /tmp/torchelastic_4jr6sy1h/none_5hkzj4t6/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam27-ib0 rank : 123 (local_rank: 3) exitcode : 1 (pid: 250399) error_file: /tmp/torchelastic_etz59gvn/none_2c5k4fru/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam05-ib0 rank : 29 (local_rank: 5) exitcode : 1 (pid: 3022092) error_file: /tmp/torchelastic_bb8khjm9/none_zmgk5wmb/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam39-ib0 rank : 211 (local_rank: 3) exitcode : 1 (pid: 1375670) error_file: /tmp/torchelastic_7onnlf4_/none_i_nfp8dj/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam09-ib0 rank : 60 (local_rank: 4) exitcode : 1 (pid: 2018783) error_file: /tmp/torchelastic_4jr6sy1h/none_5hkzj4t6/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam27-ib0 rank : 124 (local_rank: 4) exitcode : 1 (pid: 250400) error_file: /tmp/torchelastic_etz59gvn/none_2c5k4fru/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam39-ib0 rank : 212 (local_rank: 4) exitcode : 1 (pid: 1375671) error_file: /tmp/torchelastic_7onnlf4_/none_i_nfp8dj/attempt_0/4/error.json traceback : Traceback (most recent call last): main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() raise ChildFailedError( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam05-ib0 rank : 30 (local_rank: 6) exitcode : 1 (pid: 3022093) error_file: /tmp/torchelastic_bb8khjm9/none_zmgk5wmb/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) main() return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam09-ib0 rank : 61 (local_rank: 5) exitcode : 1 (pid: 2018784) error_file: /tmp/torchelastic_4jr6sy1h/none_5hkzj4t6/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam27-ib0 rank : 125 (local_rank: 5) exitcode : 1 (pid: 250401) error_file: /tmp/torchelastic_etz59gvn/none_2c5k4fru/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam05-ib0 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam39-ib0 rank : 213 (local_rank: 5) exitcode : 1 (pid: 1375672) error_file: /tmp/torchelastic_7onnlf4_/none_i_nfp8dj/attempt_0/5/error.json traceback : Traceback (most recent call last): main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam46-ib0 rank : 265 (local_rank: 1) exitcode : 1 (pid: 3916200) error_file: /tmp/torchelastic_z74qhco4/none_61x6u6iv/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam09-ib0 main() rank : 31 (local_rank: 7) exitcode : 1 (pid: 3022094) error_file: /tmp/torchelastic_bb8khjm9/none_zmgk5wmb/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam27-ib0 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam28-ib0 rank : 129 (local_rank: 1) exitcode : 1 (pid: 3609945) error_file: /tmp/torchelastic_3udpluw2/none_ys61900j/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( rank : 62 (local_rank: 6) exitcode : 1 (pid: 2018785) error_file: /tmp/torchelastic_4jr6sy1h/none_5hkzj4t6/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) rank : 126 (local_rank: 6) exitcode : 1 (pid: 250402) error_file: /tmp/torchelastic_etz59gvn/none_2c5k4fru/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam39-ib0 run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) rank : 214 (local_rank: 6) exitcode : 1 (pid: 1375673) error_file: /tmp/torchelastic_7onnlf4_/none_i_nfp8dj/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return f(*args, **kwargs) raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam46-ib0 rank : 266 (local_rank: 2) exitcode : 1 (pid: 3916201) error_file: /tmp/torchelastic_z74qhco4/none_61x6u6iv/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam28-ib0 rank : 130 (local_rank: 2) exitcode : 1 (pid: 3609946) error_file: /tmp/torchelastic_3udpluw2/none_ys61900j/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam09-ib0 run(args) time : 2022-09-03_19:21:05 host : jean-zay-iam05-ib0 rank : 25 (local_rank: 1) exitcode : 1 (pid: 3022088) error_file: /tmp/torchelastic_bb8khjm9/none_zmgk5wmb/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam27-ib0 train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( rank : 63 (local_rank: 7) exitcode : 1 (pid: 2018786) error_file: /tmp/torchelastic_4jr6sy1h/none_5hkzj4t6/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, rank : 127 (local_rank: 7) exitcode : 1 (pid: 250403) error_file: /tmp/torchelastic_etz59gvn/none_2c5k4fru/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam39-ib0 return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' rank : 215 (local_rank: 7) exitcode : 1 (pid: 1375674) error_file: /tmp/torchelastic_7onnlf4_/none_i_nfp8dj/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam34-ib0 rank : 169 (local_rank: 1) exitcode : 1 (pid: 1716397) error_file: /tmp/torchelastic_1e10518l/none_y0h73j4v/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam46-ib0 rank : 267 (local_rank: 3) exitcode : 1 (pid: 3916202) error_file: /tmp/torchelastic_z74qhco4/none_61x6u6iv/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam28-ib0 rank : 131 (local_rank: 3) exitcode : 1 (pid: 3609947) error_file: /tmp/torchelastic_3udpluw2/none_ys61900j/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam09-ib0 rank : 56 (local_rank: 0) exitcode : 1 (pid: 2018779) error_file: /tmp/torchelastic_4jr6sy1h/none_5hkzj4t6/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam27-ib0 rank : 120 (local_rank: 0) exitcode : 1 (pid: 250396) error_file: /tmp/torchelastic_etz59gvn/none_2c5k4fru/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam39-ib0 rank : 208 (local_rank: 0) exitcode : 1 (pid: 1375667) error_file: /tmp/torchelastic_7onnlf4_/none_i_nfp8dj/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam46-ib0 rank : 268 (local_rank: 4) exitcode : 1 (pid: 3916203) error_file: /tmp/torchelastic_z74qhco4/none_61x6u6iv/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam28-ib0 rank : 132 (local_rank: 4) exitcode : 1 (pid: 3609948) error_file: /tmp/torchelastic_3udpluw2/none_ys61900j/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam34-ib0 rank : 170 (local_rank: 2) exitcode : 1 (pid: 1716398) error_file: /tmp/torchelastic_1e10518l/none_y0h73j4v/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam46-ib0 rank : 269 (local_rank: 5) exitcode : 1 (pid: 3916204) error_file: /tmp/torchelastic_z74qhco4/none_61x6u6iv/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam28-ib0 rank : 133 (local_rank: 5) exitcode : 1 (pid: 3609949) error_file: /tmp/torchelastic_3udpluw2/none_ys61900j/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam34-ib0 rank : 171 (local_rank: 3) exitcode : 1 (pid: 1716399) error_file: /tmp/torchelastic_1e10518l/none_y0h73j4v/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam46-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam28-ib0 run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam34-ib0 rank : 172 (local_rank: 4) exitcode : 1 (pid: 1716400) error_file: /tmp/torchelastic_1e10518l/none_y0h73j4v/attempt_0/4/error.json traceback : Traceback (most recent call last): run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run rank : 270 (local_rank: 6) exitcode : 1 (pid: 3916205) error_file: /tmp/torchelastic_z74qhco4/none_61x6u6iv/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 134 (local_rank: 6) exitcode : 1 (pid: 3609950) error_file: /tmp/torchelastic_3udpluw2/none_ys61900j/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators main() return _run_code(code, main_globals, None, run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) elastic_launch( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code elastic_launch( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam46-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam28-ib0 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam34-ib0 rank : 173 (local_rank: 5) exitcode : 1 (pid: 1716401) error_file: /tmp/torchelastic_1e10518l/none_y0h73j4v/attempt_0/5/error.json traceback : Traceback (most recent call last): rank : 271 (local_rank: 7) exitcode : 1 (pid: 3916206) error_file: /tmp/torchelastic_z74qhco4/none_61x6u6iv/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 135 (local_rank: 7) exitcode : 1 (pid: 3609951) error_file: /tmp/torchelastic_3udpluw2/none_ys61900j/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators run(args) raise ChildFailedError( exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam34-ib0 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam46-ib0 rank : 264 (local_rank: 0) exitcode : 1 (pid: 3916199) error_file: /tmp/torchelastic_z74qhco4/none_61x6u6iv/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam28-ib0 rank : 128 (local_rank: 0) exitcode : 1 (pid: 3609944) error_file: /tmp/torchelastic_3udpluw2/none_ys61900j/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) raise ChildFailedError( rank : 174 (local_rank: 6) exitcode : 1 (pid: 1716402) error_file: /tmp/torchelastic_1e10518l/none_y0h73j4v/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam37-ib0 rank : 193 (local_rank: 1) exitcode : 1 (pid: 3154683) error_file: /tmp/torchelastic_sphnc1q5/none_f3e32xro/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main raise ChildFailedError( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam34-ib0 raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main rank : 175 (local_rank: 7) exitcode : 1 (pid: 1716403) error_file: /tmp/torchelastic_1e10518l/none_y0h73j4v/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators elastic_launch( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam37-ib0 rank : 194 (local_rank: 2) exitcode : 1 (pid: 3154684) error_file: /tmp/torchelastic_sphnc1q5/none_f3e32xro/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam06-ib0 rank : 33 (local_rank: 1) exitcode : 1 (pid: 3633706) error_file: /tmp/torchelastic_xlrfbdub/none_l37ylyoh/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam33-ib0 rank : 161 (local_rank: 1) exitcode : 1 (pid: 372265) error_file: /tmp/torchelastic_cb3t24vn/none_a7sltr4l/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam34-ib0 rank : 168 (local_rank: 0) exitcode : 1 (pid: 1716396) error_file: /tmp/torchelastic_1e10518l/none_y0h73j4v/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam06-ib0 rank : 34 (local_rank: 2) exitcode : 1 (pid: 3633707) error_file: /tmp/torchelastic_xlrfbdub/none_l37ylyoh/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent elastic_launch( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam13-ib0 rank : 73 (local_rank: 1) exitcode : 1 (pid: 1962602) error_file: /tmp/torchelastic_005hpajo/none_rhsqm4x5/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam37-ib0 rank : 195 (local_rank: 3) exitcode : 1 (pid: 3154685) error_file: /tmp/torchelastic_sphnc1q5/none_f3e32xro/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam33-ib0 rank : 162 (local_rank: 2) exitcode : 1 (pid: 372266) error_file: /tmp/torchelastic_cb3t24vn/none_a7sltr4l/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], elastic_launch( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam32-ib0 rank : 154 (local_rank: 2) exitcode : 1 (pid: 514602) error_file: /tmp/torchelastic_gaydngys/none_2_cu36td/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam37-ib0 rank : 196 (local_rank: 4) exitcode : 1 (pid: 3154686) error_file: /tmp/torchelastic_sphnc1q5/none_f3e32xro/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam06-ib0 rank : 35 (local_rank: 3) exitcode : 1 (pid: 3633708) error_file: /tmp/torchelastic_xlrfbdub/none_l37ylyoh/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], return launch_agent(self._config, self._entrypoint, list(args)) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run raise ChildFailedError( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam13-ib0 rank : 74 (local_rank: 2) exitcode : 1 (pid: 1962603) error_file: /tmp/torchelastic_005hpajo/none_rhsqm4x5/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam32-ib0 rank : 155 (local_rank: 3) exitcode : 1 (pid: 514603) error_file: /tmp/torchelastic_gaydngys/none_2_cu36td/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam33-ib0 rank : 163 (local_rank: 3) exitcode : 1 (pid: 372267) error_file: /tmp/torchelastic_cb3t24vn/none_a7sltr4l/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( raise ChildFailedError( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam06-ib0 rank : 36 (local_rank: 4) exitcode : 1 (pid: 3633709) error_file: /tmp/torchelastic_xlrfbdub/none_l37ylyoh/attempt_0/4/error.json traceback : Traceback (most recent call last): torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam30-ib0 rank : 137 (local_rank: 1) exitcode : 1 (pid: 3594385) error_file: /tmp/torchelastic_5080j71n/none_a2d0wmyw/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam44-ib0 rank : 249 (local_rank: 1) exitcode : 1 (pid: 1582085) error_file: /tmp/torchelastic_32349wly/none_521ux4ib/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam37-ib0 rank : 197 (local_rank: 5) exitcode : 1 (pid: 3154687) error_file: /tmp/torchelastic_sphnc1q5/none_f3e32xro/attempt_0/5/error.json traceback : Traceback (most recent call last): return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam33-ib0 rank : 164 (local_rank: 4) exitcode : 1 (pid: 372268) error_file: /tmp/torchelastic_cb3t24vn/none_a7sltr4l/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam13-ib0 rank : 75 (local_rank: 3) exitcode : 1 (pid: 1962604) error_file: /tmp/torchelastic_005hpajo/none_rhsqm4x5/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam19-ib0 rank : 106 (local_rank: 2) exitcode : 1 (pid: 1444754) error_file: /tmp/torchelastic_aq7qoa0p/none_pak4ghd9/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam32-ib0 rank : 156 (local_rank: 4) exitcode : 1 (pid: 514604) error_file: /tmp/torchelastic_gaydngys/none_2_cu36td/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam44-ib0 rank : 250 (local_rank: 2) exitcode : 1 (pid: 1582086) error_file: /tmp/torchelastic_32349wly/none_521ux4ib/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam43-ib0 rank : 242 (local_rank: 2) exitcode : 1 (pid: 2983256) error_file: /tmp/torchelastic_d5pszdgf/none_2qzh3tvm/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam37-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam06-ib0 rank : 37 (local_rank: 5) exitcode : 1 (pid: 3633710) error_file: /tmp/torchelastic_xlrfbdub/none_l37ylyoh/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam30-ib0 rank : 138 (local_rank: 2) exitcode : 1 (pid: 3594386) error_file: /tmp/torchelastic_5080j71n/none_a2d0wmyw/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam18-ib0 rank : 97 (local_rank: 1) exitcode : 1 (pid: 2639267) error_file: /tmp/torchelastic_2q15z9jf/none_n7_kbc1k/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam13-ib0 rank : 76 (local_rank: 4) exitcode : 1 (pid: 1962605) error_file: /tmp/torchelastic_005hpajo/none_rhsqm4x5/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam32-ib0 rank : 157 (local_rank: 5) exitcode : 1 (pid: 514605) error_file: /tmp/torchelastic_gaydngys/none_2_cu36td/attempt_0/5/error.json traceback : Traceback (most recent call last): rank : 198 (local_rank: 6) exitcode : 1 (pid: 3154688) error_file: /tmp/torchelastic_sphnc1q5/none_f3e32xro/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam33-ib0 rank : 165 (local_rank: 5) exitcode : 1 (pid: 372269) error_file: /tmp/torchelastic_cb3t24vn/none_a7sltr4l/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam19-ib0 rank : 107 (local_rank: 3) exitcode : 1 (pid: 1444755) error_file: /tmp/torchelastic_aq7qoa0p/none_pak4ghd9/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam06-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam18-ib0 rank : 98 (local_rank: 2) exitcode : 1 (pid: 2639268) error_file: /tmp/torchelastic_2q15z9jf/none_n7_kbc1k/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam44-ib0 rank : 251 (local_rank: 3) exitcode : 1 (pid: 1582087) error_file: /tmp/torchelastic_32349wly/none_521ux4ib/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam43-ib0 rank : 243 (local_rank: 3) exitcode : 1 (pid: 2983257) error_file: /tmp/torchelastic_d5pszdgf/none_2qzh3tvm/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam37-ib0 rank : 38 (local_rank: 6) exitcode : 1 (pid: 3633711) error_file: /tmp/torchelastic_xlrfbdub/none_l37ylyoh/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam30-ib0 rank : 139 (local_rank: 3) exitcode : 1 (pid: 3594387) error_file: /tmp/torchelastic_5080j71n/none_a2d0wmyw/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam33-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam13-ib0 rank : 77 (local_rank: 5) exitcode : 1 (pid: 1962606) error_file: /tmp/torchelastic_005hpajo/none_rhsqm4x5/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam32-ib0 rank : 158 (local_rank: 6) exitcode : 1 (pid: 514606) error_file: /tmp/torchelastic_gaydngys/none_2_cu36td/attempt_0/6/error.json traceback : Traceback (most recent call last): elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( rank : 199 (local_rank: 7) exitcode : 1 (pid: 3154689) error_file: /tmp/torchelastic_sphnc1q5/none_f3e32xro/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent elastic_launch( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( rank : 166 (local_rank: 6) exitcode : 1 (pid: 372270) error_file: /tmp/torchelastic_cb3t24vn/none_a7sltr4l/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam44-ib0 rank : 252 (local_rank: 4) exitcode : 1 (pid: 1582088) error_file: /tmp/torchelastic_32349wly/none_521ux4ib/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam19-ib0 rank : 108 (local_rank: 4) exitcode : 1 (pid: 1444756) error_file: /tmp/torchelastic_aq7qoa0p/none_pak4ghd9/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam06-ib0 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam30-ib0 rank : 140 (local_rank: 4) exitcode : 1 (pid: 3594388) error_file: /tmp/torchelastic_5080j71n/none_a2d0wmyw/attempt_0/4/error.json traceback : Traceback (most recent call last): train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam18-ib0 rank : 99 (local_rank: 3) exitcode : 1 (pid: 2639269) error_file: /tmp/torchelastic_2q15z9jf/none_n7_kbc1k/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam43-ib0 rank : 244 (local_rank: 4) exitcode : 1 (pid: 2983258) error_file: /tmp/torchelastic_d5pszdgf/none_2qzh3tvm/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam13-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam32-ib0 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam37-ib0 rank : 192 (local_rank: 0) exitcode : 1 (pid: 3154682) error_file: /tmp/torchelastic_sphnc1q5/none_f3e32xro/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( rank : 39 (local_rank: 7) exitcode : 1 (pid: 3633712) error_file: /tmp/torchelastic_xlrfbdub/none_l37ylyoh/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam33-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( rank : 78 (local_rank: 6) exitcode : 1 (pid: 1962607) error_file: /tmp/torchelastic_005hpajo/none_rhsqm4x5/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam19-ib0 rank : 109 (local_rank: 5) exitcode : 1 (pid: 1444757) error_file: /tmp/torchelastic_aq7qoa0p/none_pak4ghd9/attempt_0/5/error.json traceback : Traceback (most recent call last): rank : 159 (local_rank: 7) exitcode : 1 (pid: 514607) error_file: /tmp/torchelastic_gaydngys/none_2_cu36td/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, run(args) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' raise ChildFailedError( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( rank : 167 (local_rank: 7) exitcode : 1 (pid: 372271) error_file: /tmp/torchelastic_cb3t24vn/none_a7sltr4l/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam18-ib0 rank : 100 (local_rank: 4) exitcode : 1 (pid: 2639270) error_file: /tmp/torchelastic_2q15z9jf/none_n7_kbc1k/attempt_0/4/error.json traceback : Traceback (most recent call last): return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam44-ib0 rank : 253 (local_rank: 5) exitcode : 1 (pid: 1582089) error_file: /tmp/torchelastic_32349wly/none_521ux4ib/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam43-ib0 rank : 245 (local_rank: 5) exitcode : 1 (pid: 2983259) error_file: /tmp/torchelastic_d5pszdgf/none_2qzh3tvm/attempt_0/5/error.json traceback : Traceback (most recent call last): train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam06-ib0 rank : 32 (local_rank: 0) exitcode : 1 (pid: 3633705) error_file: /tmp/torchelastic_xlrfbdub/none_l37ylyoh/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam30-ib0 rank : 141 (local_rank: 5) exitcode : 1 (pid: 3594389) error_file: /tmp/torchelastic_5080j71n/none_a2d0wmyw/attempt_0/5/error.json traceback : Traceback (most recent call last): train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam13-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam33-ib0 rank : 160 (local_rank: 0) exitcode : 1 (pid: 372264) error_file: /tmp/torchelastic_cb3t24vn/none_a7sltr4l/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam44-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( rank : 79 (local_rank: 7) exitcode : 1 (pid: 1962608) error_file: /tmp/torchelastic_005hpajo/none_rhsqm4x5/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam19-ib0 rank : 110 (local_rank: 6) exitcode : 1 (pid: 1444758) error_file: /tmp/torchelastic_aq7qoa0p/none_pak4ghd9/attempt_0/6/error.json traceback : Traceback (most recent call last): time : 2022-09-03_19:21:05 host : jean-zay-iam32-ib0 rank : 153 (local_rank: 1) exitcode : 1 (pid: 514601) error_file: /tmp/torchelastic_gaydngys/none_2_cu36td/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam42-ib0 rank : 233 (local_rank: 1) exitcode : 1 (pid: 3042224) error_file: /tmp/torchelastic_myv6g32a/none_ywprs7lj/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam30-ib0 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam18-ib0 rank : 101 (local_rank: 5) exitcode : 1 (pid: 2639271) error_file: /tmp/torchelastic_2q15z9jf/none_n7_kbc1k/attempt_0/5/error.json traceback : Traceback (most recent call last): raise ChildFailedError( rank : 254 (local_rank: 6) exitcode : 1 (pid: 1582090) error_file: /tmp/torchelastic_32349wly/none_521ux4ib/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam43-ib0 rank : 246 (local_rank: 6) exitcode : 1 (pid: 2983260) error_file: /tmp/torchelastic_d5pszdgf/none_2qzh3tvm/attempt_0/6/error.json traceback : Traceback (most recent call last): train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], rank : 142 (local_rank: 6) exitcode : 1 (pid: 3594390) error_file: /tmp/torchelastic_5080j71n/none_a2d0wmyw/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam13-ib0 rank : 72 (local_rank: 0) exitcode : 1 (pid: 1962601) error_file: /tmp/torchelastic_005hpajo/none_rhsqm4x5/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam19-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam42-ib0 rank : 234 (local_rank: 2) exitcode : 1 (pid: 3042225) error_file: /tmp/torchelastic_myv6g32a/none_ywprs7lj/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam18-ib0 torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam40-ib0 rank : 217 (local_rank: 1) exitcode : 1 (pid: 1321264) error_file: /tmp/torchelastic_rtr9j8kr/none_a37azpk4/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam44-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam43-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, rank : 111 (local_rank: 7) exitcode : 1 (pid: 1444759) error_file: /tmp/torchelastic_aq7qoa0p/none_pak4ghd9/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam15-ib0 rank : 89 (local_rank: 1) exitcode : 1 (pid: 2136953) error_file: /tmp/torchelastic_owj83phu/none_klnk4x8j/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam30-ib0 rank : 102 (local_rank: 6) exitcode : 1 (pid: 2639272) error_file: /tmp/torchelastic_2q15z9jf/none_n7_kbc1k/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 255 (local_rank: 7) exitcode : 1 (pid: 1582091) error_file: /tmp/torchelastic_32349wly/none_521ux4ib/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 247 (local_rank: 7) exitcode : 1 (pid: 2983261) error_file: /tmp/torchelastic_d5pszdgf/none_2qzh3tvm/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], rank : 143 (local_rank: 7) exitcode : 1 (pid: 3594391) error_file: /tmp/torchelastic_5080j71n/none_a2d0wmyw/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam42-ib0 rank : 235 (local_rank: 3) exitcode : 1 (pid: 3042226) error_file: /tmp/torchelastic_myv6g32a/none_ywprs7lj/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam15-ib0 rank : 90 (local_rank: 2) exitcode : 1 (pid: 2136954) error_file: /tmp/torchelastic_owj83phu/none_klnk4x8j/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam18-ib0 return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam40-ib0 rank : 218 (local_rank: 2) exitcode : 1 (pid: 1321265) error_file: /tmp/torchelastic_rtr9j8kr/none_a37azpk4/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam44-ib0 rank : 248 (local_rank: 0) exitcode : 1 (pid: 1582084) error_file: /tmp/torchelastic_32349wly/none_521ux4ib/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam19-ib0 rank : 105 (local_rank: 1) exitcode : 1 (pid: 1444753) error_file: /tmp/torchelastic_aq7qoa0p/none_pak4ghd9/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( elastic_launch( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam30-ib0 rank : 136 (local_rank: 0) exitcode : 1 (pid: 3594384) error_file: /tmp/torchelastic_5080j71n/none_a2d0wmyw/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( return _run_code(code, main_globals, None, rank : 103 (local_rank: 7) exitcode : 1 (pid: 2639273) error_file: /tmp/torchelastic_2q15z9jf/none_n7_kbc1k/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, time : 2022-09-03_19:21:05 host : jean-zay-iam43-ib0 rank : 241 (local_rank: 1) exitcode : 1 (pid: 2983255) error_file: /tmp/torchelastic_d5pszdgf/none_2qzh3tvm/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam42-ib0 rank : 236 (local_rank: 4) exitcode : 1 (pid: 3042227) error_file: /tmp/torchelastic_myv6g32a/none_ywprs7lj/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam15-ib0 rank : 91 (local_rank: 3) exitcode : 1 (pid: 2136955) error_file: /tmp/torchelastic_owj83phu/none_klnk4x8j/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ exec(code, run_globals) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam18-ib0 rank : 96 (local_rank: 0) exitcode : 1 (pid: 2639266) error_file: /tmp/torchelastic_2q15z9jf/none_n7_kbc1k/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam40-ib0 rank : 219 (local_rank: 3) exitcode : 1 (pid: 1321266) error_file: /tmp/torchelastic_rtr9j8kr/none_a37azpk4/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ raise ChildFailedError( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam42-ib0 rank : 237 (local_rank: 5) exitcode : 1 (pid: 3042228) error_file: /tmp/torchelastic_myv6g32a/none_ywprs7lj/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam15-ib0 rank : 92 (local_rank: 4) exitcode : 1 (pid: 2136956) error_file: /tmp/torchelastic_owj83phu/none_klnk4x8j/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam40-ib0 rank : 220 (local_rank: 4) exitcode : 1 (pid: 1321267) error_file: /tmp/torchelastic_rtr9j8kr/none_a37azpk4/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam31-ib0 rank : 145 (local_rank: 1) exitcode : 1 (pid: 515487) error_file: /tmp/torchelastic_w5g68na6/none_os92wb9v/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam42-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( rank : 238 (local_rank: 6) exitcode : 1 (pid: 3042229) error_file: /tmp/torchelastic_myv6g32a/none_ywprs7lj/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam15-ib0 rank : 93 (local_rank: 5) exitcode : 1 (pid: 2136957) error_file: /tmp/torchelastic_owj83phu/none_klnk4x8j/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam40-ib0 rank : 221 (local_rank: 5) exitcode : 1 (pid: 1321268) error_file: /tmp/torchelastic_rtr9j8kr/none_a37azpk4/attempt_0/5/error.json traceback : Traceback (most recent call last): train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam38-ib0 rank : 201 (local_rank: 1) exitcode : 1 (pid: 3786103) error_file: /tmp/torchelastic_yfr41awh/none_bv0yc0a8/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam42-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam15-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam40-ib0 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main rank : 239 (local_rank: 7) exitcode : 1 (pid: 3042230) error_file: /tmp/torchelastic_myv6g32a/none_ywprs7lj/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 94 (local_rank: 6) exitcode : 1 (pid: 2136958) error_file: /tmp/torchelastic_owj83phu/none_klnk4x8j/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam31-ib0 rank : 146 (local_rank: 2) exitcode : 1 (pid: 515488) error_file: /tmp/torchelastic_w5g68na6/none_os92wb9v/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) rank : 222 (local_rank: 6) exitcode : 1 (pid: 1321269) error_file: /tmp/torchelastic_rtr9j8kr/none_a37azpk4/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam42-ib0 rank : 232 (local_rank: 0) exitcode : 1 (pid: 3042223) error_file: /tmp/torchelastic_myv6g32a/none_ywprs7lj/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam15-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( return _run_code(code, main_globals, None, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam38-ib0 rank : 202 (local_rank: 2) exitcode : 1 (pid: 3786104) error_file: /tmp/torchelastic_yfr41awh/none_bv0yc0a8/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, rank : 95 (local_rank: 7) exitcode : 1 (pid: 2136959) error_file: /tmp/torchelastic_owj83phu/none_klnk4x8j/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam31-ib0 rank : 147 (local_rank: 3) exitcode : 1 (pid: 515489) error_file: /tmp/torchelastic_w5g68na6/none_os92wb9v/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam15-ib0 rank : 88 (local_rank: 0) exitcode : 1 (pid: 2136952) error_file: /tmp/torchelastic_owj83phu/none_klnk4x8j/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam38-ib0 rank : 203 (local_rank: 3) exitcode : 1 (pid: 3786105) error_file: /tmp/torchelastic_yfr41awh/none_bv0yc0a8/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam31-ib0 rank : 148 (local_rank: 4) exitcode : 1 (pid: 515490) error_file: /tmp/torchelastic_w5g68na6/none_os92wb9v/attempt_0/4/error.json traceback : Traceback (most recent call last): return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam38-ib0 rank : 204 (local_rank: 4) exitcode : 1 (pid: 3786106) error_file: /tmp/torchelastic_yfr41awh/none_bv0yc0a8/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam31-ib0 rank : 149 (local_rank: 5) exitcode : 1 (pid: 515491) error_file: /tmp/torchelastic_w5g68na6/none_os92wb9v/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam38-ib0 rank : 205 (local_rank: 5) exitcode : 1 (pid: 3786107) error_file: /tmp/torchelastic_yfr41awh/none_bv0yc0a8/attempt_0/5/error.json traceback : Traceback (most recent call last): raise ChildFailedError( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam31-ib0 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) rank : 150 (local_rank: 6) exitcode : 1 (pid: 515492) error_file: /tmp/torchelastic_w5g68na6/none_os92wb9v/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators raise ChildFailedError( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam38-ib0 train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) rank : 206 (local_rank: 6) exitcode : 1 (pid: 3786108) error_file: /tmp/torchelastic_yfr41awh/none_bv0yc0a8/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam31-ib0 train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) raise ChildFailedError( rank : 151 (local_rank: 7) exitcode : 1 (pid: 515493) error_file: /tmp/torchelastic_w5g68na6/none_os92wb9v/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam38-ib0 train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' rank : 207 (local_rank: 7) exitcode : 1 (pid: 3786109) error_file: /tmp/torchelastic_yfr41awh/none_bv0yc0a8/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam52-ib0 rank : 281 (local_rank: 1) exitcode : 1 (pid: 1780026) error_file: /tmp/torchelastic_m2nbf03l/none_exuxn03h/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return _run_code(code, main_globals, None, return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam31-ib0 rank : 144 (local_rank: 0) exitcode : 1 (pid: 515486) error_file: /tmp/torchelastic_w5g68na6/none_os92wb9v/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam41-ib0 rank : 225 (local_rank: 1) exitcode : 1 (pid: 2671928) error_file: /tmp/torchelastic_2ly7hier/none_on1tx8hm/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam38-ib0 rank : 200 (local_rank: 0) exitcode : 1 (pid: 3786102) error_file: /tmp/torchelastic_yfr41awh/none_bv0yc0a8/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], return _run_code(code, main_globals, None, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam26-ib0 rank : 113 (local_rank: 1) exitcode : 1 (pid: 421897) error_file: /tmp/torchelastic_a3160a21/none_m00cdhoe/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam52-ib0 rank : 282 (local_rank: 2) exitcode : 1 (pid: 1780027) error_file: /tmp/torchelastic_m2nbf03l/none_exuxn03h/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam41-ib0 rank : 226 (local_rank: 2) exitcode : 1 (pid: 2671929) error_file: /tmp/torchelastic_2ly7hier/none_on1tx8hm/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( raise ChildFailedError( exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam26-ib0 rank : 114 (local_rank: 2) exitcode : 1 (pid: 421898) error_file: /tmp/torchelastic_a3160a21/none_m00cdhoe/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam52-ib0 rank : 283 (local_rank: 3) exitcode : 1 (pid: 1780028) error_file: /tmp/torchelastic_m2nbf03l/none_exuxn03h/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam41-ib0 rank : 227 (local_rank: 3) exitcode : 1 (pid: 2671930) error_file: /tmp/torchelastic_2ly7hier/none_on1tx8hm/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam26-ib0 rank : 115 (local_rank: 3) exitcode : 1 (pid: 421899) error_file: /tmp/torchelastic_a3160a21/none_m00cdhoe/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam03-ib0 rank : 9 (local_rank: 1) exitcode : 1 (pid: 1892795) error_file: /tmp/torchelastic_t8h_tgzo/none_ebz8___g/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam52-ib0 rank : 284 (local_rank: 4) exitcode : 1 (pid: 1780029) error_file: /tmp/torchelastic_m2nbf03l/none_exuxn03h/attempt_0/4/error.json traceback : Traceback (most recent call last): torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam14-ib0 rank : 81 (local_rank: 1) exitcode : 1 (pid: 2229737) error_file: /tmp/torchelastic_ls213d08/none_ooo0hkc0/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam41-ib0 rank : 228 (local_rank: 4) exitcode : 1 (pid: 2671931) error_file: /tmp/torchelastic_2ly7hier/none_on1tx8hm/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam26-ib0 rank : 116 (local_rank: 4) exitcode : 1 (pid: 421900) error_file: /tmp/torchelastic_a3160a21/none_m00cdhoe/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam52-ib0 rank : 285 (local_rank: 5) exitcode : 1 (pid: 1780030) error_file: /tmp/torchelastic_m2nbf03l/none_exuxn03h/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam14-ib0 rank : 82 (local_rank: 2) exitcode : 1 (pid: 2229738) error_file: /tmp/torchelastic_ls213d08/none_ooo0hkc0/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam41-ib0 rank : 229 (local_rank: 5) exitcode : 1 (pid: 2671932) error_file: /tmp/torchelastic_2ly7hier/none_on1tx8hm/attempt_0/5/error.json traceback : Traceback (most recent call last): return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam03-ib0 rank : 10 (local_rank: 2) exitcode : 1 (pid: 1892796) error_file: /tmp/torchelastic_t8h_tgzo/none_ebz8___g/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam26-ib0 rank : 117 (local_rank: 5) exitcode : 1 (pid: 421901) error_file: /tmp/torchelastic_a3160a21/none_m00cdhoe/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam52-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam41-ib0 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( rank : 286 (local_rank: 6) exitcode : 1 (pid: 1780031) error_file: /tmp/torchelastic_m2nbf03l/none_exuxn03h/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent rank : 230 (local_rank: 6) exitcode : 1 (pid: 2671933) error_file: /tmp/torchelastic_2ly7hier/none_on1tx8hm/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam26-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam14-ib0 rank : 83 (local_rank: 3) exitcode : 1 (pid: 2229739) error_file: /tmp/torchelastic_ls213d08/none_ooo0hkc0/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main rank : 118 (local_rank: 6) exitcode : 1 (pid: 421902) error_file: /tmp/torchelastic_a3160a21/none_m00cdhoe/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam03-ib0 rank : 11 (local_rank: 3) exitcode : 1 (pid: 1892797) error_file: /tmp/torchelastic_t8h_tgzo/none_ebz8___g/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam52-ib0 main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam41-ib0 train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( rank : 287 (local_rank: 7) exitcode : 1 (pid: 1780032) error_file: /tmp/torchelastic_m2nbf03l/none_exuxn03h/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam14-ib0 rank : 84 (local_rank: 4) exitcode : 1 (pid: 2229740) error_file: /tmp/torchelastic_ls213d08/none_ooo0hkc0/attempt_0/4/error.json traceback : Traceback (most recent call last): main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper rank : 231 (local_rank: 7) exitcode : 1 (pid: 2671934) error_file: /tmp/torchelastic_2ly7hier/none_on1tx8hm/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam26-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam03-ib0 rank : 12 (local_rank: 4) exitcode : 1 (pid: 1892798) error_file: /tmp/torchelastic_t8h_tgzo/none_ebz8___g/attempt_0/4/error.json traceback : Traceback (most recent call last): train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) rank : 119 (local_rank: 7) exitcode : 1 (pid: 421903) error_file: /tmp/torchelastic_a3160a21/none_m00cdhoe/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam52-ib0 rank : 280 (local_rank: 0) exitcode : 1 (pid: 1780025) error_file: /tmp/torchelastic_m2nbf03l/none_exuxn03h/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( raise ChildFailedError( train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam14-ib0 rank : 85 (local_rank: 5) exitcode : 1 (pid: 2229741) error_file: /tmp/torchelastic_ls213d08/none_ooo0hkc0/attempt_0/5/error.json traceback : Traceback (most recent call last): ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam41-ib0 rank : 224 (local_rank: 0) exitcode : 1 (pid: 2671927) error_file: /tmp/torchelastic_2ly7hier/none_on1tx8hm/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam26-ib0 rank : 112 (local_rank: 0) exitcode : 1 (pid: 421896) error_file: /tmp/torchelastic_a3160a21/none_m00cdhoe/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam03-ib0 rank : 13 (local_rank: 5) exitcode : 1 (pid: 1892799) error_file: /tmp/torchelastic_t8h_tgzo/none_ebz8___g/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam14-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam03-ib0 rank : 86 (local_rank: 6) exitcode : 1 (pid: 2229742) error_file: /tmp/torchelastic_ls213d08/none_ooo0hkc0/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 14 (local_rank: 6) exitcode : 1 (pid: 1892800) error_file: /tmp/torchelastic_t8h_tgzo/none_ebz8___g/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam02-ib0 rank : 1 (local_rank: 1) exitcode : 1 (pid: 3637158) error_file: /tmp/torchelastic_m1sg3pa6/none_bauk5415/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam14-ib0 return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam03-ib0 rank : 87 (local_rank: 7) exitcode : 1 (pid: 2229743) error_file: /tmp/torchelastic_ls213d08/none_ooo0hkc0/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], rank : 15 (local_rank: 7) exitcode : 1 (pid: 1892801) error_file: /tmp/torchelastic_t8h_tgzo/none_ebz8___g/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam14-ib0 rank : 80 (local_rank: 0) exitcode : 1 (pid: 2229736) error_file: /tmp/torchelastic_ls213d08/none_ooo0hkc0/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam02-ib0 rank : 2 (local_rank: 2) exitcode : 1 (pid: 3637159) error_file: /tmp/torchelastic_m1sg3pa6/none_bauk5415/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam03-ib0 rank : 8 (local_rank: 0) exitcode : 1 (pid: 1892794) error_file: /tmp/torchelastic_t8h_tgzo/none_ebz8___g/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam02-ib0 rank : 3 (local_rank: 3) exitcode : 1 (pid: 3637160) error_file: /tmp/torchelastic_m1sg3pa6/none_bauk5415/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent main() File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam02-ib0 rank : 4 (local_rank: 4) exitcode : 1 (pid: 3637161) error_file: /tmp/torchelastic_m1sg3pa6/none_bauk5415/attempt_0/4/error.json traceback : Traceback (most recent call last): raise ChildFailedError( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return f(*args, **kwargs) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam35-ib0 rank : 177 (local_rank: 1) exitcode : 1 (pid: 1553910) error_file: /tmp/torchelastic_ry9_z_s2/none_bxvxmqx4/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam35-ib0 rank : 178 (local_rank: 2) exitcode : 1 (pid: 1553911) error_file: /tmp/torchelastic_ry9_z_s2/none_bxvxmqx4/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam35-ib0 rank : 179 (local_rank: 3) exitcode : 1 (pid: 1553912) error_file: /tmp/torchelastic_ry9_z_s2/none_bxvxmqx4/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam35-ib0 rank : 180 (local_rank: 4) exitcode : 1 (pid: 1553913) error_file: /tmp/torchelastic_ry9_z_s2/none_bxvxmqx4/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam35-ib0 rank : 181 (local_rank: 5) exitcode : 1 (pid: 1553914) error_file: /tmp/torchelastic_ry9_z_s2/none_bxvxmqx4/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) raise ChildFailedError( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam35-ib0 return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code rank : 182 (local_rank: 6) exitcode : 1 (pid: 1553915) error_file: /tmp/torchelastic_ry9_z_s2/none_bxvxmqx4/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam35-ib0 elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ rank : 183 (local_rank: 7) exitcode : 1 (pid: 1553916) error_file: /tmp/torchelastic_ry9_z_s2/none_bxvxmqx4/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' raise ChildFailedError( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam35-ib0 rank : 176 (local_rank: 0) exitcode : 1 (pid: 1553909) error_file: /tmp/torchelastic_ry9_z_s2/none_bxvxmqx4/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( run(args) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam45-ib0 rank : 257 (local_rank: 1) exitcode : 1 (pid: 409962) error_file: /tmp/torchelastic_nx5vajso/none_a_tr8nkm/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam08-ib0 rank : 49 (local_rank: 1) exitcode : 1 (pid: 2933080) error_file: /tmp/torchelastic_vdeh9bo5/none_qdoa6_du/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam45-ib0 rank : 258 (local_rank: 2) exitcode : 1 (pid: 409963) error_file: /tmp/torchelastic_nx5vajso/none_a_tr8nkm/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam08-ib0 rank : 50 (local_rank: 2) exitcode : 1 (pid: 2933081) error_file: /tmp/torchelastic_vdeh9bo5/none_qdoa6_du/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam45-ib0 rank : 259 (local_rank: 3) exitcode : 1 (pid: 409964) error_file: /tmp/torchelastic_nx5vajso/none_a_tr8nkm/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam45-ib0 rank : 260 (local_rank: 4) exitcode : 1 (pid: 409965) error_file: /tmp/torchelastic_nx5vajso/none_a_tr8nkm/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam08-ib0 rank : 51 (local_rank: 3) exitcode : 1 (pid: 2933082) error_file: /tmp/torchelastic_vdeh9bo5/none_qdoa6_du/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam04-ib0 rank : 17 (local_rank: 1) exitcode : 1 (pid: 1982358) error_file: /tmp/torchelastic_nkgrtsdo/none_ln8ci3q9/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam45-ib0 rank : 261 (local_rank: 5) exitcode : 1 (pid: 409966) error_file: /tmp/torchelastic_nx5vajso/none_a_tr8nkm/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam08-ib0 rank : 52 (local_rank: 4) exitcode : 1 (pid: 2933083) error_file: /tmp/torchelastic_vdeh9bo5/none_qdoa6_du/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam36-ib0 rank : 185 (local_rank: 1) exitcode : 1 (pid: 1802210) error_file: /tmp/torchelastic_gzqobn1x/none_mgqbtjfn/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam45-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( rank : 262 (local_rank: 6) exitcode : 1 (pid: 409967) error_file: /tmp/torchelastic_nx5vajso/none_a_tr8nkm/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam08-ib0 rank : 53 (local_rank: 5) exitcode : 1 (pid: 2933084) error_file: /tmp/torchelastic_vdeh9bo5/none_qdoa6_du/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam04-ib0 rank : 18 (local_rank: 2) exitcode : 1 (pid: 1982359) error_file: /tmp/torchelastic_nkgrtsdo/none_ln8ci3q9/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam45-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam08-ib0 rank : 263 (local_rank: 7) exitcode : 1 (pid: 409968) error_file: /tmp/torchelastic_nx5vajso/none_a_tr8nkm/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators rank : 54 (local_rank: 6) exitcode : 1 (pid: 2933085) error_file: /tmp/torchelastic_vdeh9bo5/none_qdoa6_du/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam36-ib0 rank : 186 (local_rank: 2) exitcode : 1 (pid: 1802211) error_file: /tmp/torchelastic_gzqobn1x/none_mgqbtjfn/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return launch_agent(self._config, self._entrypoint, list(args)) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam45-ib0 rank : 256 (local_rank: 0) exitcode : 1 (pid: 409961) error_file: /tmp/torchelastic_nx5vajso/none_a_tr8nkm/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam08-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam04-ib0 rank : 19 (local_rank: 3) exitcode : 1 (pid: 1982360) error_file: /tmp/torchelastic_nkgrtsdo/none_ln8ci3q9/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, rank : 55 (local_rank: 7) exitcode : 1 (pid: 2933086) error_file: /tmp/torchelastic_vdeh9bo5/none_qdoa6_du/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam04-ib0 rank : 20 (local_rank: 4) exitcode : 1 (pid: 1982361) error_file: /tmp/torchelastic_nkgrtsdo/none_ln8ci3q9/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam36-ib0 rank : 187 (local_rank: 3) exitcode : 1 (pid: 1802212) error_file: /tmp/torchelastic_gzqobn1x/none_mgqbtjfn/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam08-ib0 rank : 48 (local_rank: 0) exitcode : 1 (pid: 2933079) error_file: /tmp/torchelastic_vdeh9bo5/none_qdoa6_du/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam36-ib0 rank : 188 (local_rank: 4) exitcode : 1 (pid: 1802213) error_file: /tmp/torchelastic_gzqobn1x/none_mgqbtjfn/attempt_0/4/error.json traceback : Traceback (most recent call last): raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam04-ib0 rank : 21 (local_rank: 5) exitcode : 1 (pid: 1982362) error_file: /tmp/torchelastic_nkgrtsdo/none_ln8ci3q9/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam04-ib0 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam36-ib0 rank : 189 (local_rank: 5) exitcode : 1 (pid: 1802214) error_file: /tmp/torchelastic_gzqobn1x/none_mgqbtjfn/attempt_0/5/error.json traceback : Traceback (most recent call last): rank : 22 (local_rank: 6) exitcode : 1 (pid: 1982363) error_file: /tmp/torchelastic_nkgrtsdo/none_ln8ci3q9/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam36-ib0 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam04-ib0 rank : 190 (local_rank: 6) exitcode : 1 (pid: 1802215) error_file: /tmp/torchelastic_gzqobn1x/none_mgqbtjfn/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam07-ib0 rank : 41 (local_rank: 1) exitcode : 1 (pid: 3956403) error_file: /tmp/torchelastic_m2tmuwf7/none_ipar4jz6/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( rank : 23 (local_rank: 7) exitcode : 1 (pid: 1982364) error_file: /tmp/torchelastic_nkgrtsdo/none_ln8ci3q9/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam36-ib0 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam04-ib0 rank : 16 (local_rank: 0) exitcode : 1 (pid: 1982357) error_file: /tmp/torchelastic_nkgrtsdo/none_ln8ci3q9/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( rank : 191 (local_rank: 7) exitcode : 1 (pid: 1802216) error_file: /tmp/torchelastic_gzqobn1x/none_mgqbtjfn/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam36-ib0 rank : 184 (local_rank: 0) exitcode : 1 (pid: 1802209) error_file: /tmp/torchelastic_gzqobn1x/none_mgqbtjfn/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam07-ib0 rank : 42 (local_rank: 2) exitcode : 1 (pid: 3956404) error_file: /tmp/torchelastic_m2tmuwf7/none_ipar4jz6/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam07-ib0 rank : 43 (local_rank: 3) exitcode : 1 (pid: 3956405) error_file: /tmp/torchelastic_m2tmuwf7/none_ipar4jz6/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam07-ib0 rank : 44 (local_rank: 4) exitcode : 1 (pid: 3956406) error_file: /tmp/torchelastic_m2tmuwf7/none_ipar4jz6/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam07-ib0 rank : 45 (local_rank: 5) exitcode : 1 (pid: 3956407) error_file: /tmp/torchelastic_m2tmuwf7/none_ipar4jz6/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam07-ib0 rank : 46 (local_rank: 6) exitcode : 1 (pid: 3956408) error_file: /tmp/torchelastic_m2tmuwf7/none_ipar4jz6/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam07-ib0 rank : 47 (local_rank: 7) exitcode : 1 (pid: 3956409) error_file: /tmp/torchelastic_m2tmuwf7/none_ipar4jz6/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ exec(code, run_globals) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam11-ib0 rank : 65 (local_rank: 1) exitcode : 1 (pid: 1973334) error_file: /tmp/torchelastic_oghz3ihd/none_qp6d1tru/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam11-ib0 rank : 66 (local_rank: 2) exitcode : 1 (pid: 1973335) error_file: /tmp/torchelastic_oghz3ihd/none_qp6d1tru/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam11-ib0 rank : 67 (local_rank: 3) exitcode : 1 (pid: 1973336) error_file: /tmp/torchelastic_oghz3ihd/none_qp6d1tru/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam11-ib0 rank : 68 (local_rank: 4) exitcode : 1 (pid: 1973337) error_file: /tmp/torchelastic_oghz3ihd/none_qp6d1tru/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam11-ib0 rank : 69 (local_rank: 5) exitcode : 1 (pid: 1973338) error_file: /tmp/torchelastic_oghz3ihd/none_qp6d1tru/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam11-ib0 return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main rank : 70 (local_rank: 6) exitcode : 1 (pid: 1973339) error_file: /tmp/torchelastic_oghz3ihd/none_qp6d1tru/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam11-ib0 rank : 71 (local_rank: 7) exitcode : 1 (pid: 1973340) error_file: /tmp/torchelastic_oghz3ihd/none_qp6d1tru/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam11-ib0 rank : 64 (local_rank: 0) exitcode : 1 (pid: 1973333) error_file: /tmp/torchelastic_oghz3ihd/none_qp6d1tru/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:21:05 host : jean-zay-iam47-ib0 rank : 273 (local_rank: 1) exitcode : 1 (pid: 929492) error_file: /tmp/torchelastic_5trsc9vd/none_u3eikdi1/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [2]: time : 2022-09-03_19:21:05 host : jean-zay-iam47-ib0 rank : 274 (local_rank: 2) exitcode : 1 (pid: 929493) error_file: /tmp/torchelastic_5trsc9vd/none_u3eikdi1/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [3]: time : 2022-09-03_19:21:05 host : jean-zay-iam47-ib0 rank : 275 (local_rank: 3) exitcode : 1 (pid: 929494) error_file: /tmp/torchelastic_5trsc9vd/none_u3eikdi1/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [4]: time : 2022-09-03_19:21:05 host : jean-zay-iam47-ib0 rank : 276 (local_rank: 4) exitcode : 1 (pid: 929495) error_file: /tmp/torchelastic_5trsc9vd/none_u3eikdi1/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam47-ib0 rank : 277 (local_rank: 5) exitcode : 1 (pid: 929496) error_file: /tmp/torchelastic_5trsc9vd/none_u3eikdi1/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam47-ib0 rank : 278 (local_rank: 6) exitcode : 1 (pid: 929497) error_file: /tmp/torchelastic_5trsc9vd/none_u3eikdi1/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam47-ib0 rank : 279 (local_rank: 7) exitcode : 1 (pid: 929498) error_file: /tmp/torchelastic_5trsc9vd/none_u3eikdi1/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam47-ib0 rank : 272 (local_rank: 0) exitcode : 1 (pid: 929491) error_file: /tmp/torchelastic_5trsc9vd/none_u3eikdi1/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam40-ib0 rank : 223 (local_rank: 7) exitcode : 1 (pid: 1321270) error_file: /tmp/torchelastic_rtr9j8kr/none_a37azpk4/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam40-ib0 rank : 216 (local_rank: 0) exitcode : 1 (pid: 1321263) error_file: /tmp/torchelastic_rtr9j8kr/none_a37azpk4/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [5]: time : 2022-09-03_19:21:05 host : jean-zay-iam02-ib0 rank : 5 (local_rank: 5) exitcode : 1 (pid: 3637162) error_file: /tmp/torchelastic_m1sg3pa6/none_bauk5415/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [6]: time : 2022-09-03_19:21:05 host : jean-zay-iam02-ib0 rank : 6 (local_rank: 6) exitcode : 1 (pid: 3637163) error_file: /tmp/torchelastic_m1sg3pa6/none_bauk5415/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' [7]: time : 2022-09-03_19:21:05 host : jean-zay-iam02-ib0 rank : 7 (local_rank: 7) exitcode : 1 (pid: 3637164) error_file: /tmp/torchelastic_m1sg3pa6/none_bauk5415/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam02-ib0 rank : 0 (local_rank: 0) exitcode : 1 (pid: 3637157) error_file: /tmp/torchelastic_m1sg3pa6/none_bauk5415/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:21:05 host : jean-zay-iam07-ib0 rank : 40 (local_rank: 0) exitcode : 1 (pid: 3956402) error_file: /tmp/torchelastic_m2tmuwf7/none_ipar4jz6/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1187, in build_train_valid_test_data_iterators train_ds, valid_ds, test_ds = build_train_valid_test_datasets_provider(train_val_test_num_samples) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 163, in train_valid_test_datasets_provider d = build_dataset_group_gpt( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 90, in build_dataset_group dataset = _build_single_datasets(paths[0], File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 134, in _build_single_datasets indexed_dataset = get_indexed_dataset_(data_prefix, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 222, in get_indexed_dataset_ indexed_dataset.sizes.shape[0])) AttributeError: 'NoneType' object has no attribute 'sizes' ============================================================ srun: error: jean-zay-iam14: task 10: Exited with exit code 1 srun: launch/slurm: _step_signal: Terminating StepId=927326.0 slurmstepd: error: *** STEP 927326.0 ON jean-zay-iam02 CANCELLED AT 2022-09-03T19:21:09 *** srun: error: jean-zay-iam32: task 19: Exited with exit code 1 srun: error: jean-zay-iam42: task 29: Exited with exit code 1 srun: error: jean-zay-iam44: task 31: Exited with exit code 1 srun: error: jean-zay-iam30: task 17: Exited with exit code 1 srun: error: jean-zay-iam19: task 13: Exited with exit code 1 srun: error: jean-zay-iam43: task 30: Exited with exit code 1 srun: error: jean-zay-iam05: task 3: Exited with exit code 1 srun: error: jean-zay-iam26: task 14: Exited with exit code 1 srun: error: jean-zay-iam39: task 26: Exited with exit code 1 srun: error: jean-zay-iam03: task 1: Exited with exit code 1 srun: error: jean-zay-iam06: task 4: Exited with exit code 1 srun: error: jean-zay-iam46: task 33: Exited with exit code 1 srun: error: jean-zay-iam41: task 28: Exited with exit code 1 srun: error: jean-zay-iam45: task 32: Exited with exit code 1 srun: error: jean-zay-iam18: task 12: Exited with exit code 1 srun: error: jean-zay-iam08: task 6: Exited with exit code 1 srun: error: jean-zay-iam27: task 15: Exited with exit code 1 srun: error: jean-zay-iam31: task 18: Exited with exit code 1 srun: error: jean-zay-iam36: task 23: Exited with exit code 1 srun: error: jean-zay-iam07: task 5: Exited with exit code 1 srun: error: jean-zay-iam47: task 34: Exited with exit code 1 srun: error: jean-zay-iam13: task 9: Exited with exit code 1 srun: error: jean-zay-iam35: task 22: Exited with exit code 1 srun: error: jean-zay-iam38: task 25: Exited with exit code 1 srun: error: jean-zay-iam37: task 24: Exited with exit code 1 srun: error: jean-zay-iam28: task 16: Exited with exit code 1 srun: error: jean-zay-iam34: task 21: Exited with exit code 1 srun: error: jean-zay-iam33: task 20: Exited with exit code 1 srun: error: jean-zay-iam09: task 7: Exited with exit code 1 srun: error: jean-zay-iam15: task 11: Exited with exit code 1 srun: error: jean-zay-iam11: task 8: Exited with exit code 1 srun: error: jean-zay-iam04: task 2: Exited with exit code 1 srun: error: jean-zay-iam40: task 27: Exited with exit code 1 srun: error: jean-zay-iam02: task 0: Exited with exit code 1 srun: error: jean-zay-iam52: task 35: Exited with exit code 1 WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]:Traceback (most recent call last): [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]:Traceback (most recent call last): [default5]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]:Traceback (most recent call last): [default1]: main() [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: return f(*args, **kwargs) [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default3]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: take_action(action, args, option_string) [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: action(self, namespace, argument_values, option_string) [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: return f(*args, **kwargs) [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: pretrain( [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default5]:Traceback (most recent call last): [default6]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: return f(*args, **kwargs) [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: pretrain( [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]: return f(*args, **kwargs) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default1]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: start_index = consume_optional(start_index) [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: args, argv = self.parse_known_args(args, namespace) [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: action(self, namespace, argument_values, option_string) [default3]: take_action(action, args, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: action(self, namespace, argument_values, option_string) [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: args = parser.parse_args() [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]:Traceback (most recent call last): [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default4]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: pretrain( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: args, argv = self.parse_known_args(args, namespace) [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default2]: args, argv = self.parse_known_args(args, namespace) [default1]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: namespace, args = self._parse_known_args(args, namespace) [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: start_index = consume_optional(start_index) [default3]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default3]: start_index = consume_optional(start_index) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: args, argv = self.parse_known_args(args, namespace) [default2]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default4]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default4]: take_action(action, args, option_string) [default2]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: return f(*args, **kwargs) [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: pretrain( [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: args = parser.parse_args() [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: namespace, args = self._parse_known_args(args, namespace) [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default2]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default4]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: args, argv = self.parse_known_args(args, namespace) [default5]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: start_index = consume_optional(start_index) [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default3]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]:Traceback (most recent call last): [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default1]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default3]: pretrain( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: main() [default3]: take_action(action, args, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]:Traceback (most recent call last): [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]:Traceback (most recent call last): [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: main() [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default3]: return f(*args, **kwargs) [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default3]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: start_index = consume_optional(start_index) [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: action(self, namespace, argument_values, option_string) [default1]: start_index = consume_optional(start_index) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default2]: take_action(action, args, option_string) [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: start_index = consume_optional(start_index) [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: take_action(action, args, option_string) [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: main() [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default6]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default6]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: start_index = consume_optional(start_index) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: pretrain( [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default0]: namespace, args = self._parse_known_args(args, namespace) [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default0]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: args = parser.parse_args() [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: action(self, namespace, argument_values, option_string) [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]: action(self, namespace, argument_values, option_string) [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]:Traceback (most recent call last): [default3]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]:Traceback (most recent call last): [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: return f(*args, **kwargs) [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default5]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]: start_index = consume_optional(start_index) [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: start_index = consume_optional(start_index) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: take_action(action, args, option_string) [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: take_action(action, args, option_string) [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default1]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: return f(*args, **kwargs) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: start_index = consume_optional(start_index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default4]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: args, argv = self.parse_known_args(args, namespace) [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default1]: initialize_megatron(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default0]: initialize_megatron(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default1]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default1]: args = _parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default0]: set_global_variables(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default1]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default0]: args = _parse_args(extra_args_provider=extra_args_provider, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default1]: args = parser.parse_args() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default0]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default1]: args, argv = self.parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default1]: namespace, args = self._parse_known_args(args, namespace) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default0]: args = parser.parse_args() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default1]: start_index = consume_optional(start_index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default1]: take_action(action, args, option_string) [default0]: args, argv = self.parse_known_args(args, namespace) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default0]: namespace, args = self._parse_known_args(args, namespace) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default1]: action(self, namespace, argument_values, option_string) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default0]: start_index = consume_optional(start_index) [default1]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default1]:AssertionError: Got multiple lines 4 instead of 1 expected [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default0]: take_action(action, args, option_string) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default0]: action(self, namespace, argument_values, option_string) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default0]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default0]:AssertionError: Got multiple lines 4 instead of 1 expected [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default2]: initialize_megatron(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default2]: set_global_variables(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default2]: args = _parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default2]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default2]: args = parser.parse_args() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default2]: args, argv = self.parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default2]: namespace, args = self._parse_known_args(args, namespace) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default2]: start_index = consume_optional(start_index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default2]: take_action(action, args, option_string) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default2]: action(self, namespace, argument_values, option_string) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default2]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default2]:AssertionError: Got multiple lines 4 instead of 1 expected [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default6]: initialize_megatron(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default6]: set_global_variables(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default6]: args = _parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default6]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default6]: args = parser.parse_args() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default6]: args, argv = self.parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default6]: namespace, args = self._parse_known_args(args, namespace) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default6]: start_index = consume_optional(start_index) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default6]: take_action(action, args, option_string) [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default6]: action(self, namespace, argument_values, option_string) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default6]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default6]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default3]: initialize_megatron(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: set_global_variables(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default5]: pretrain( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default7]: initialize_megatron(extra_args_provider=extra_args_provider, [default5]: initialize_megatron(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default7]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default7]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default3]: args = _parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default3]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default7]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default3]: args = parser.parse_args() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args = parser.parse_args() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default7]: args, argv = self.parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default7]: namespace, args = self._parse_known_args(args, namespace) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]: start_index = consume_optional(start_index) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: args, argv = self.parse_known_args(args, namespace) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default3]: namespace, args = self._parse_known_args(args, namespace) [default7]: take_action(action, args, option_string) [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default7]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain [default4]: initialize_megatron(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron [default4]: set_global_variables(extra_args_provider=extra_args_provider, [default7]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default7]:AssertionError: Got multiple lines 4 instead of 1 expected [default3]: start_index = consume_optional(start_index) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default3]: take_action(action, args, option_string) [default4]: args = _parse_args(extra_args_provider=extra_args_provider, [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default4]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default3]: action(self, namespace, argument_values, option_string) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default3]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default4]: args = parser.parse_args() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default3]:AssertionError: Got multiple lines 4 instead of 1 expected [default4]: args, argv = self.parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default4]: namespace, args = self._parse_known_args(args, namespace) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default4]: start_index = consume_optional(start_index) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default4]: take_action(action, args, option_string) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default4]: action(self, namespace, argument_values, option_string) [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default4]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default4]:AssertionError: Got multiple lines 4 instead of 1 expected [default5]: set_global_variables(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables [default5]: args = _parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args [default5]: _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args [default5]: args = parser.parse_args() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args [default5]: args, argv = self.parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args [default5]: namespace, args = self._parse_known_args(args, namespace) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args [default5]: start_index = consume_optional(start_index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional [default5]: take_action(action, args, option_string) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action [default5]: action(self, namespace, argument_values, option_string) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ [default5]: assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" [default5]:AssertionError: Got multiple lines 4 instead of 1 expected WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3610932 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3610937 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3634794 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3634800 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 930473 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 930479 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3917199 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3917205 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 251382) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1717395) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2934070) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2640256) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2019772) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3787115) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1322288) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3957473) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1803202) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1974320) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2672912) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1376649) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 516469) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 422876) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3023082) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3043217) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2984287) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2230720) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1583066) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3155663) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3610930) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 373245) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 515590) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3634793) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1983338) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1554901) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1963657) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 410954) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 930474) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1445743) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3595441) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1781032) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3917198) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1893854) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2138192) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3638440) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return _run_code(code, main_globals, None, raise ChildFailedError( return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam40-ib0 rank : 217 (local_rank: 1) exitcode : 1 (pid: 1322289) error_file: /tmp/torchelastic_byq_3zaf/none_7339p6ap/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( exec(code, run_globals) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, main() Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) return _run_code(code, main_globals, None, main() action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam40-ib0 rank : 218 (local_rank: 2) exitcode : 1 (pid: 1322290) error_file: /tmp/torchelastic_byq_3zaf/none_7339p6ap/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam40-ib0 rank : 219 (local_rank: 3) exitcode : 1 (pid: 1322291) error_file: /tmp/torchelastic_byq_3zaf/none_7339p6ap/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, exec(code, run_globals) return f(*args, **kwargs) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args return _run_code(code, main_globals, None, namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam40-ib0 rank : 220 (local_rank: 4) exitcode : 1 (pid: 1322292) error_file: /tmp/torchelastic_byq_3zaf/none_7339p6ap/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam40-ib0 rank : 221 (local_rank: 5) exitcode : 1 (pid: 1322293) error_file: /tmp/torchelastic_byq_3zaf/none_7339p6ap/attempt_0/5/error.json traceback : Traceback (most recent call last): return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables main() args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam40-ib0 rank : 222 (local_rank: 6) exitcode : 1 (pid: 1322294) error_file: /tmp/torchelastic_byq_3zaf/none_7339p6ap/attempt_0/6/error.json traceback : Traceback (most recent call last): exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables return f(*args, **kwargs) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam40-ib0 rank : 223 (local_rank: 7) exitcode : 1 (pid: 1322295) error_file: /tmp/torchelastic_byq_3zaf/none_7339p6ap/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam40-ib0 rank : 216 (local_rank: 0) exitcode : 1 (pid: 1322288) error_file: /tmp/torchelastic_byq_3zaf/none_7339p6ap/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main raise ChildFailedError( main() run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam33-ib0 rank : 161 (local_rank: 1) exitcode : 1 (pid: 373246) error_file: /tmp/torchelastic_17xl314e/none_oheproc3/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return f(*args, **kwargs) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam33-ib0 rank : 162 (local_rank: 2) exitcode : 1 (pid: 373247) error_file: /tmp/torchelastic_17xl314e/none_oheproc3/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return f(*args, **kwargs) main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam33-ib0 rank : 163 (local_rank: 3) exitcode : 1 (pid: 373248) error_file: /tmp/torchelastic_17xl314e/none_oheproc3/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam33-ib0 rank : 164 (local_rank: 4) exitcode : 1 (pid: 373249) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:34 host : jean-zay-iam44-ib0 rank : 249 (local_rank: 1) exitcode : 1 (pid: 1583067) error_file: /tmp/torchelastic_8gv63z7u/none_pzg0vw91/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( run(args) error_file: /tmp/torchelastic_17xl314e/none_oheproc3/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam33-ib0 rank : 165 (local_rank: 5) exitcode : 1 (pid: 373250) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, error_file: /tmp/torchelastic_17xl314e/none_oheproc3/attempt_0/5/error.json traceback : Traceback (most recent call last): return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main elastic_launch( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam05-ib0 rank : 25 (local_rank: 1) exitcode : 1 (pid: 3023083) error_file: /tmp/torchelastic_xsa2xxzd/none_iu4jlbla/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam33-ib0 rank : 166 (local_rank: 6) exitcode : 1 (pid: 373251) error_file: /tmp/torchelastic_17xl314e/none_oheproc3/attempt_0/6/error.json traceback : Traceback (most recent call last): elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables elastic_launch( action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:34 host : jean-zay-iam44-ib0 rank : 250 (local_rank: 2) exitcode : 1 (pid: 1583068) error_file: /tmp/torchelastic_8gv63z7u/none_pzg0vw91/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam33-ib0 rank : 167 (local_rank: 7) exitcode : 1 (pid: 373252) error_file: /tmp/torchelastic_17xl314e/none_oheproc3/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam05-ib0 rank : 26 (local_rank: 2) exitcode : 1 (pid: 3023084) error_file: /tmp/torchelastic_xsa2xxzd/none_iu4jlbla/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent run(args) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) elastic_launch( AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:34 host : jean-zay-iam44-ib0 rank : 251 (local_rank: 3) exitcode : 1 (pid: 1583069) error_file: /tmp/torchelastic_8gv63z7u/none_pzg0vw91/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam33-ib0 rank : 160 (local_rank: 0) exitcode : 1 (pid: 373245) error_file: /tmp/torchelastic_17xl314e/none_oheproc3/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, raise ChildFailedError( args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam05-ib0 rank : 27 (local_rank: 3) exitcode : 1 (pid: 3023085) error_file: /tmp/torchelastic_xsa2xxzd/none_iu4jlbla/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:34 host : jean-zay-iam44-ib0 rank : 252 (local_rank: 4) exitcode : 1 (pid: 1583070) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ raise ChildFailedError( raise ChildFailedError( error_file: /tmp/torchelastic_8gv63z7u/none_pzg0vw91/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam05-ib0 rank : 28 (local_rank: 4) exitcode : 1 (pid: 3023086) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam07-ib0 rank : 41 (local_rank: 1) exitcode : 1 (pid: 3957474) error_file: /tmp/torchelastic_wkfzfndb/none_mufjtl59/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:34 host : jean-zay-iam44-ib0 rank : 253 (local_rank: 5) exitcode : 1 (pid: 1583071) error_file: /tmp/torchelastic_xsa2xxzd/none_iu4jlbla/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run error_file: /tmp/torchelastic_8gv63z7u/none_pzg0vw91/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:34 host : jean-zay-iam47-ib0 rank : 274 (local_rank: 2) exitcode : 1 (pid: 930475) error_file: /tmp/torchelastic_hzd9iph0/none_k8a0i7oa/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam05-ib0 rank : 29 (local_rank: 5) exitcode : 1 (pid: 3023087) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam36-ib0 rank : 185 (local_rank: 1) exitcode : 1 (pid: 1803203) error_file: /tmp/torchelastic_6gwldhzm/none_742uc8bm/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) error_file: /tmp/torchelastic_xsa2xxzd/none_iu4jlbla/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam04-ib0 rank : 17 (local_rank: 1) exitcode : 1 (pid: 1983339) error_file: /tmp/torchelastic_2uadjomm/none_t9ed4zzj/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:34 host : jean-zay-iam44-ib0 rank : 254 (local_rank: 6) exitcode : 1 (pid: 1583072) error_file: /tmp/torchelastic_8gv63z7u/none_pzg0vw91/attempt_0/6/error.json traceback : Traceback (most recent call last): return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam05-ib0 rank : 30 (local_rank: 6) exitcode : 1 (pid: 3023088) error_file: /tmp/torchelastic_xsa2xxzd/none_iu4jlbla/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:34 host : jean-zay-iam47-ib0 rank : 275 (local_rank: 3) exitcode : 1 (pid: 930476) error_file: /tmp/torchelastic_hzd9iph0/none_k8a0i7oa/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:34 host : jean-zay-iam44-ib0 rank : 255 (local_rank: 7) exitcode : 1 (pid: 1583073) error_file: /tmp/torchelastic_8gv63z7u/none_pzg0vw91/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam07-ib0 rank : 42 (local_rank: 2) exitcode : 1 (pid: 3957475) error_file: /tmp/torchelastic_wkfzfndb/none_mufjtl59/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam36-ib0 rank : 186 (local_rank: 2) exitcode : 1 (pid: 1803204) error_file: /tmp/torchelastic_6gwldhzm/none_742uc8bm/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam05-ib0 rank : 31 (local_rank: 7) exitcode : 1 (pid: 3023089) error_file: /tmp/torchelastic_xsa2xxzd/none_iu4jlbla/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam04-ib0 rank : 18 (local_rank: 2) exitcode : 1 (pid: 1983340) error_file: /tmp/torchelastic_2uadjomm/none_t9ed4zzj/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:34 host : jean-zay-iam47-ib0 rank : 276 (local_rank: 4) exitcode : 1 (pid: 930477) error_file: /tmp/torchelastic_hzd9iph0/none_k8a0i7oa/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:34 host : jean-zay-iam44-ib0 rank : 248 (local_rank: 0) exitcode : 1 (pid: 1583066) error_file: /tmp/torchelastic_8gv63z7u/none_pzg0vw91/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" return f(*args, **kwargs) args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam45-ib0 rank : 257 (local_rank: 1) exitcode : 1 (pid: 410955) error_file: /tmp/torchelastic_dd26tour/none_f_rjlmvd/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam07-ib0 rank : 43 (local_rank: 3) exitcode : 1 (pid: 3957476) error_file: /tmp/torchelastic_wkfzfndb/none_mufjtl59/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam36-ib0 rank : 187 (local_rank: 3) exitcode : 1 (pid: 1803205) error_file: /tmp/torchelastic_6gwldhzm/none_742uc8bm/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" raise ChildFailedError( args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam05-ib0 rank : 24 (local_rank: 0) exitcode : 1 (pid: 3023082) error_file: /tmp/torchelastic_xsa2xxzd/none_iu4jlbla/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam04-ib0 rank : 19 (local_rank: 3) exitcode : 1 (pid: 1983341) error_file: /tmp/torchelastic_2uadjomm/none_t9ed4zzj/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:34 host : jean-zay-iam47-ib0 rank : 277 (local_rank: 5) exitcode : 1 (pid: 930478) action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args error_file: /tmp/torchelastic_hzd9iph0/none_k8a0i7oa/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam07-ib0 rank : 44 (local_rank: 4) exitcode : 1 (pid: 3957477) namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam36-ib0 rank : 188 (local_rank: 4) exitcode : 1 (pid: 1803206) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam19-ib0 rank : 105 (local_rank: 1) exitcode : 1 (pid: 1445744) error_file: /tmp/torchelastic_85m4f6j4/none_ngr5p3z0/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ error_file: /tmp/torchelastic_wkfzfndb/none_mufjtl59/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, error_file: /tmp/torchelastic_6gwldhzm/none_742uc8bm/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam04-ib0 rank : 20 (local_rank: 4) exitcode : 1 (pid: 1983342) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:34 host : jean-zay-iam47-ib0 rank : 279 (local_rank: 7) exitcode : 1 (pid: 930480) action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam45-ib0 rank : 258 (local_rank: 2) exitcode : 1 (pid: 410956) error_file: /tmp/torchelastic_dd26tour/none_f_rjlmvd/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) error_file: /tmp/torchelastic_2uadjomm/none_t9ed4zzj/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, error_file: /tmp/torchelastic_hzd9iph0/none_k8a0i7oa/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam07-ib0 rank : 45 (local_rank: 5) exitcode : 1 (pid: 3957478) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam36-ib0 rank : 189 (local_rank: 5) exitcode : 1 (pid: 1803207) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, error_file: /tmp/torchelastic_wkfzfndb/none_mufjtl59/attempt_0/5/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_6gwldhzm/none_742uc8bm/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam04-ib0 rank : 21 (local_rank: 5) exitcode : 1 (pid: 1983343) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables error_file: /tmp/torchelastic_2uadjomm/none_t9ed4zzj/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:34 host : jean-zay-iam47-ib0 rank : 273 (local_rank: 1) exitcode : 1 (pid: 930474) error_file: /tmp/torchelastic_hzd9iph0/none_k8a0i7oa/attempt_0/1/error.json traceback : Traceback (most recent call last): args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam45-ib0 rank : 259 (local_rank: 3) exitcode : 1 (pid: 410957) error_file: /tmp/torchelastic_dd26tour/none_f_rjlmvd/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam07-ib0 rank : 46 (local_rank: 6) exitcode : 1 (pid: 3957479) error_file: /tmp/torchelastic_wkfzfndb/none_mufjtl59/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam36-ib0 rank : 190 (local_rank: 6) exitcode : 1 (pid: 1803208) error_file: /tmp/torchelastic_6gwldhzm/none_742uc8bm/attempt_0/6/error.json traceback : Traceback (most recent call last): Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam19-ib0 rank : 106 (local_rank: 2) exitcode : 1 (pid: 1445745) error_file: /tmp/torchelastic_85m4f6j4/none_ngr5p3z0/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam04-ib0 rank : 22 (local_rank: 6) exitcode : 1 (pid: 1983344) error_file: /tmp/torchelastic_2uadjomm/none_t9ed4zzj/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam11-ib0 rank : 65 (local_rank: 1) exitcode : 1 (pid: 1974321) error_file: /tmp/torchelastic_z0s8xjbj/none_sdb3x104/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam45-ib0 rank : 260 (local_rank: 4) exitcode : 1 (pid: 410958) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args error_file: /tmp/torchelastic_dd26tour/none_f_rjlmvd/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam07-ib0 rank : 47 (local_rank: 7) exitcode : 1 (pid: 3957480) error_file: /tmp/torchelastic_wkfzfndb/none_mufjtl59/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam36-ib0 rank : 191 (local_rank: 7) exitcode : 1 (pid: 1803209) error_file: /tmp/torchelastic_6gwldhzm/none_742uc8bm/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:34 host : jean-zay-iam13-ib0 rank : 73 (local_rank: 1) exitcode : 1 (pid: 1963658) error_file: /tmp/torchelastic_r8e07zsa/none_t57limgo/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam04-ib0 rank : 23 (local_rank: 7) exitcode : 1 (pid: 1983345) error_file: /tmp/torchelastic_2uadjomm/none_t9ed4zzj/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam19-ib0 rank : 107 (local_rank: 3) exitcode : 1 (pid: 1445746) error_file: /tmp/torchelastic_85m4f6j4/none_ngr5p3z0/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam45-ib0 rank : 261 (local_rank: 5) exitcode : 1 (pid: 410959) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam11-ib0 rank : 66 (local_rank: 2) exitcode : 1 (pid: 1974322) error_file: /tmp/torchelastic_z0s8xjbj/none_sdb3x104/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic_dd26tour/none_f_rjlmvd/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam07-ib0 rank : 40 (local_rank: 0) exitcode : 1 (pid: 3957473) error_file: /tmp/torchelastic_wkfzfndb/none_mufjtl59/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam36-ib0 rank : 184 (local_rank: 0) exitcode : 1 (pid: 1803202) error_file: /tmp/torchelastic_6gwldhzm/none_742uc8bm/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam19-ib0 rank : 108 (local_rank: 4) exitcode : 1 (pid: 1445747) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam04-ib0 rank : 16 (local_rank: 0) exitcode : 1 (pid: 1983338) error_file: /tmp/torchelastic_2uadjomm/none_t9ed4zzj/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic_85m4f6j4/none_ngr5p3z0/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam45-ib0 rank : 262 (local_rank: 6) exitcode : 1 (pid: 410960) error_file: /tmp/torchelastic_dd26tour/none_f_rjlmvd/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:34 host : jean-zay-iam13-ib0 rank : 74 (local_rank: 2) exitcode : 1 (pid: 1963659) error_file: /tmp/torchelastic_r8e07zsa/none_t57limgo/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam19-ib0 rank : 109 (local_rank: 5) exitcode : 1 (pid: 1445748) AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam11-ib0 rank : 67 (local_rank: 3) exitcode : 1 (pid: 1974323) error_file: /tmp/torchelastic_z0s8xjbj/none_sdb3x104/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ return launch_agent(self._config, self._entrypoint, list(args)) error_file: /tmp/torchelastic_85m4f6j4/none_ngr5p3z0/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) exec(code, run_globals) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam45-ib0 rank : 263 (local_rank: 7) exitcode : 1 (pid: 410961) error_file: /tmp/torchelastic_dd26tour/none_f_rjlmvd/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam11-ib0 rank : 68 (local_rank: 4) exitcode : 1 (pid: 1974324) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:34 host : jean-zay-iam13-ib0 rank : 75 (local_rank: 3) exitcode : 1 (pid: 1963660) error_file: /tmp/torchelastic_r8e07zsa/none_t57limgo/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam19-ib0 rank : 110 (local_rank: 6) exitcode : 1 (pid: 1445749) error_file: /tmp/torchelastic_85m4f6j4/none_ngr5p3z0/attempt_0/6/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_z0s8xjbj/none_sdb3x104/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam11-ib0 rank : 69 (local_rank: 5) exitcode : 1 (pid: 1974325) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam45-ib0 rank : 256 (local_rank: 0) exitcode : 1 (pid: 410954) error_file: /tmp/torchelastic_dd26tour/none_f_rjlmvd/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:34 host : jean-zay-iam13-ib0 rank : 76 (local_rank: 4) exitcode : 1 (pid: 1963661) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) error_file: /tmp/torchelastic_z0s8xjbj/none_sdb3x104/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, elastic_launch( error_file: /tmp/torchelastic_r8e07zsa/none_t57limgo/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam19-ib0 rank : 111 (local_rank: 7) exitcode : 1 (pid: 1445750) error_file: /tmp/torchelastic_85m4f6j4/none_ngr5p3z0/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) raise ChildFailedError( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:34 host : jean-zay-iam28-ib0 rank : 129 (local_rank: 1) exitcode : 1 (pid: 3610931) error_file: /tmp/torchelastic_qm2fcqrw/none_dbvly6ii/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:34 host : jean-zay-iam13-ib0 rank : 77 (local_rank: 5) exitcode : 1 (pid: 1963662) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam11-ib0 rank : 70 (local_rank: 6) exitcode : 1 (pid: 1974326) error_file: /tmp/torchelastic_z0s8xjbj/none_sdb3x104/attempt_0/6/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_r8e07zsa/none_t57limgo/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam19-ib0 rank : 104 (local_rank: 0) exitcode : 1 (pid: 1445743) error_file: /tmp/torchelastic_85m4f6j4/none_ngr5p3z0/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:34 host : jean-zay-iam13-ib0 rank : 78 (local_rank: 6) exitcode : 1 (pid: 1963663) error_file: /tmp/torchelastic_r8e07zsa/none_t57limgo/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam11-ib0 rank : 71 (local_rank: 7) exitcode : 1 (pid: 1974327) error_file: /tmp/torchelastic_z0s8xjbj/none_sdb3x104/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam41-ib0 rank : 225 (local_rank: 1) exitcode : 1 (pid: 2672913) error_file: /tmp/torchelastic_p95dml70/none_gzy_kbbo/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:34 host : jean-zay-iam28-ib0 rank : 131 (local_rank: 3) exitcode : 1 (pid: 3610933) error_file: /tmp/torchelastic_qm2fcqrw/none_dbvly6ii/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam11-ib0 rank : 64 (local_rank: 0) exitcode : 1 (pid: 1974320) error_file: /tmp/torchelastic_z0s8xjbj/none_sdb3x104/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:34 host : jean-zay-iam13-ib0 rank : 79 (local_rank: 7) exitcode : 1 (pid: 1963664) error_file: /tmp/torchelastic_r8e07zsa/none_t57limgo/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:34 host : jean-zay-iam28-ib0 rank : 132 (local_rank: 4) exitcode : 1 (pid: 3610934) error_file: /tmp/torchelastic_qm2fcqrw/none_dbvly6ii/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) raise ChildFailedError( namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:34 host : jean-zay-iam28-ib0 rank : 133 (local_rank: 5) exitcode : 1 (pid: 3610935) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:34 host : jean-zay-iam13-ib0 rank : 72 (local_rank: 0) exitcode : 1 (pid: 1963657) error_file: /tmp/torchelastic_r8e07zsa/none_t57limgo/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic_qm2fcqrw/none_dbvly6ii/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam41-ib0 rank : 226 (local_rank: 2) exitcode : 1 (pid: 2672914) error_file: /tmp/torchelastic_p95dml70/none_gzy_kbbo/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:34 host : jean-zay-iam28-ib0 rank : 134 (local_rank: 6) exitcode : 1 (pid: 3610936) action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ error_file: /tmp/torchelastic_qm2fcqrw/none_dbvly6ii/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam08-ib0 rank : 49 (local_rank: 1) exitcode : 1 (pid: 2934071) error_file: /tmp/torchelastic_yh_swdea/none_hbkuggnj/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:34 host : jean-zay-iam28-ib0 rank : 128 (local_rank: 0) exitcode : 1 (pid: 3610930) error_file: /tmp/torchelastic_qm2fcqrw/none_dbvly6ii/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam41-ib0 rank : 227 (local_rank: 3) exitcode : 1 (pid: 2672915) error_file: /tmp/torchelastic_p95dml70/none_gzy_kbbo/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam41-ib0 rank : 228 (local_rank: 4) exitcode : 1 (pid: 2672916) error_file: /tmp/torchelastic_p95dml70/none_gzy_kbbo/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam08-ib0 rank : 50 (local_rank: 2) exitcode : 1 (pid: 2934072) error_file: /tmp/torchelastic_yh_swdea/none_hbkuggnj/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam41-ib0 rank : 229 (local_rank: 5) exitcode : 1 (pid: 2672917) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, return launch_agent(self._config, self._entrypoint, list(args)) error_file: /tmp/torchelastic_p95dml70/none_gzy_kbbo/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam41-ib0 rank : 230 (local_rank: 6) exitcode : 1 (pid: 2672918) error_file: /tmp/torchelastic_p95dml70/none_gzy_kbbo/attempt_0/6/error.json traceback : Traceback (most recent call last): AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam08-ib0 rank : 51 (local_rank: 3) exitcode : 1 (pid: 2934073) error_file: /tmp/torchelastic_yh_swdea/none_hbkuggnj/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam08-ib0 rank : 52 (local_rank: 4) exitcode : 1 (pid: 2934074) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam41-ib0 rank : 231 (local_rank: 7) exitcode : 1 (pid: 2672919) error_file: /tmp/torchelastic_p95dml70/none_gzy_kbbo/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) error_file: /tmp/torchelastic_yh_swdea/none_hbkuggnj/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam08-ib0 rank : 53 (local_rank: 5) exitcode : 1 (pid: 2934075) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args error_file: /tmp/torchelastic_yh_swdea/none_hbkuggnj/attempt_0/5/error.json traceback : Traceback (most recent call last): _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam41-ib0 rank : 224 (local_rank: 0) exitcode : 1 (pid: 2672912) error_file: /tmp/torchelastic_p95dml70/none_gzy_kbbo/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam08-ib0 rank : 54 (local_rank: 6) exitcode : 1 (pid: 2934076) error_file: /tmp/torchelastic_yh_swdea/none_hbkuggnj/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam03-ib0 rank : 9 (local_rank: 1) exitcode : 1 (pid: 1893855) error_file: /tmp/torchelastic_j60kwew5/none_9dxoim_j/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam08-ib0 rank : 55 (local_rank: 7) exitcode : 1 (pid: 2934077) error_file: /tmp/torchelastic_yh_swdea/none_hbkuggnj/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam08-ib0 rank : 48 (local_rank: 0) exitcode : 1 (pid: 2934070) error_file: /tmp/torchelastic_yh_swdea/none_hbkuggnj/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam03-ib0 rank : 10 (local_rank: 2) exitcode : 1 (pid: 1893856) error_file: /tmp/torchelastic_j60kwew5/none_9dxoim_j/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" return launch_agent(self._config, self._entrypoint, list(args)) AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam03-ib0 rank : 11 (local_rank: 3) exitcode : 1 (pid: 1893857) error_file: /tmp/torchelastic_j60kwew5/none_9dxoim_j/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam03-ib0 rank : 12 (local_rank: 4) exitcode : 1 (pid: 1893858) error_file: /tmp/torchelastic_j60kwew5/none_9dxoim_j/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam03-ib0 rank : 13 (local_rank: 5) exitcode : 1 (pid: 1893859) error_file: /tmp/torchelastic_j60kwew5/none_9dxoim_j/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam03-ib0 rank : 14 (local_rank: 6) exitcode : 1 (pid: 1893860) error_file: /tmp/torchelastic_j60kwew5/none_9dxoim_j/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args return _run_code(code, main_globals, None, _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam03-ib0 rank : 15 (local_rank: 7) exitcode : 1 (pid: 1893861) error_file: /tmp/torchelastic_j60kwew5/none_9dxoim_j/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam03-ib0 rank : 8 (local_rank: 0) exitcode : 1 (pid: 1893854) error_file: /tmp/torchelastic_j60kwew5/none_9dxoim_j/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam52-ib0 rank : 281 (local_rank: 1) exitcode : 1 (pid: 1781033) error_file: /tmp/torchelastic_gkp3s0i5/none_4ds07cgk/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action return _run_code(code, main_globals, None, action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam52-ib0 rank : 282 (local_rank: 2) exitcode : 1 (pid: 1781034) error_file: /tmp/torchelastic_gkp3s0i5/none_4ds07cgk/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam52-ib0 rank : 283 (local_rank: 3) exitcode : 1 (pid: 1781035) error_file: /tmp/torchelastic_gkp3s0i5/none_4ds07cgk/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, exec(code, run_globals) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam52-ib0 rank : 284 (local_rank: 4) exitcode : 1 (pid: 1781036) error_file: /tmp/torchelastic_gkp3s0i5/none_4ds07cgk/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam52-ib0 rank : 285 (local_rank: 5) exitcode : 1 (pid: 1781037) error_file: /tmp/torchelastic_gkp3s0i5/none_4ds07cgk/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables return f(*args, **kwargs) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam52-ib0 rank : 286 (local_rank: 6) exitcode : 1 (pid: 1781038) error_file: /tmp/torchelastic_gkp3s0i5/none_4ds07cgk/attempt_0/6/error.json traceback : Traceback (most recent call last): return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables return _run_code(code, main_globals, None, args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam52-ib0 rank : 287 (local_rank: 7) exitcode : 1 (pid: 1781039) error_file: /tmp/torchelastic_gkp3s0i5/none_4ds07cgk/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam52-ib0 rank : 280 (local_rank: 0) exitcode : 1 (pid: 1781032) error_file: /tmp/torchelastic_gkp3s0i5/none_4ds07cgk/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ main() Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) exec(code, run_globals) return _run_code(code, main_globals, None, return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return _run_code(code, main_globals, None, main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main elastic_launch( exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return _run_code(code, main_globals, None, elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run raise ChildFailedError( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam02-ib0 rank : 1 (local_rank: 1) exitcode : 1 (pid: 3638441) error_file: /tmp/torchelastic__h_i6_s2/none_usa8vt7d/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam02-ib0 rank : 2 (local_rank: 2) exitcode : 1 (pid: 3638442) error_file: /tmp/torchelastic__h_i6_s2/none_usa8vt7d/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return f(*args, **kwargs) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:34 host : jean-zay-iam34-ib0 rank : 169 (local_rank: 1) exitcode : 1 (pid: 1717396) error_file: /tmp/torchelastic_btkcerpw/none_a_71b3s7/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args elastic_launch( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam02-ib0 rank : 3 (local_rank: 3) exitcode : 1 (pid: 3638443) error_file: /tmp/torchelastic__h_i6_s2/none_usa8vt7d/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args elastic_launch( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:34 host : jean-zay-iam34-ib0 rank : 170 (local_rank: 2) exitcode : 1 (pid: 1717397) error_file: /tmp/torchelastic_btkcerpw/none_a_71b3s7/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam02-ib0 rank : 4 (local_rank: 4) exitcode : 1 (pid: 3638444) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper error_file: /tmp/torchelastic__h_i6_s2/none_usa8vt7d/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam02-ib0 rank : 5 (local_rank: 5) exitcode : 1 (pid: 3638445) error_file: /tmp/torchelastic__h_i6_s2/none_usa8vt7d/attempt_0/5/error.json return _run_code(code, main_globals, None, return launch_agent(self._config, self._entrypoint, list(args)) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args traceback : Traceback (most recent call last): return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ raise ChildFailedError( raise ChildFailedError( args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:34 host : jean-zay-iam34-ib0 rank : 171 (local_rank: 3) exitcode : 1 (pid: 1717398) error_file: /tmp/torchelastic_btkcerpw/none_a_71b3s7/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam02-ib0 rank : 6 (local_rank: 6) exitcode : 1 (pid: 3638446) error_file: /tmp/torchelastic__h_i6_s2/none_usa8vt7d/attempt_0/6/error.json traceback : Traceback (most recent call last): return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:34 host : jean-zay-iam39-ib0 rank : 209 (local_rank: 1) exitcode : 1 (pid: 1376650) error_file: /tmp/torchelastic_fztdq5ky/none_1h2z1ov6/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam27-ib0 rank : 121 (local_rank: 1) exitcode : 1 (pid: 251383) error_file: /tmp/torchelastic_tcgfqc_9/none_2q_nrw35/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:34 host : jean-zay-iam34-ib0 rank : 172 (local_rank: 4) exitcode : 1 (pid: 1717399) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, error_file: /tmp/torchelastic_btkcerpw/none_a_71b3s7/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent run(args) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) raise ChildFailedError( raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:34 host : jean-zay-iam34-ib0 rank : 173 (local_rank: 5) exitcode : 1 (pid: 1717400) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action error_file: /tmp/torchelastic_btkcerpw/none_a_71b3s7/attempt_0/5/error.json traceback : Traceback (most recent call last): raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam43-ib0 rank : 241 (local_rank: 1) exitcode : 1 (pid: 2984288) error_file: /tmp/torchelastic_lves7nz4/none_lj2ebc2n/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:34 host : jean-zay-iam39-ib0 rank : 210 (local_rank: 2) exitcode : 1 (pid: 1376651) error_file: /tmp/torchelastic_fztdq5ky/none_1h2z1ov6/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return _run_code(code, main_globals, None, raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam27-ib0 rank : 122 (local_rank: 2) exitcode : 1 (pid: 251384) error_file: /tmp/torchelastic_tcgfqc_9/none_2q_nrw35/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:34 host : jean-zay-iam34-ib0 rank : 174 (local_rank: 6) exitcode : 1 (pid: 1717401) error_file: /tmp/torchelastic_btkcerpw/none_a_71b3s7/attempt_0/6/error.json traceback : Traceback (most recent call last): torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:34 host : jean-zay-iam38-ib0 rank : 201 (local_rank: 1) exitcode : 1 (pid: 3787116) error_file: /tmp/torchelastic_w18la3da/none_q3kjt4pv/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:34 host : jean-zay-iam09-ib0 rank : 57 (local_rank: 1) exitcode : 1 (pid: 2019773) error_file: /tmp/torchelastic_0b2jd8j2/none_dl0bj28o/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam32-ib0 rank : 153 (local_rank: 1) exitcode : 1 (pid: 515591) error_file: /tmp/torchelastic_77vihpk7/none_rgdge1sh/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam43-ib0 rank : 242 (local_rank: 2) exitcode : 1 (pid: 2984289) error_file: /tmp/torchelastic_lves7nz4/none_lj2ebc2n/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:34 host : jean-zay-iam39-ib0 rank : 211 (local_rank: 3) exitcode : 1 (pid: 1376652) error_file: /tmp/torchelastic_fztdq5ky/none_1h2z1ov6/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:34 host : jean-zay-iam34-ib0 rank : 175 (local_rank: 7) exitcode : 1 (pid: 1717402) error_file: /tmp/torchelastic_btkcerpw/none_a_71b3s7/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:34 host : jean-zay-iam35-ib0 rank : 177 (local_rank: 1) exitcode : 1 (pid: 1554902) error_file: /tmp/torchelastic_3d30mlq0/none_t64zxbli/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam27-ib0 rank : 123 (local_rank: 3) exitcode : 1 (pid: 251385) error_file: /tmp/torchelastic_tcgfqc_9/none_2q_nrw35/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam32-ib0 rank : 154 (local_rank: 2) exitcode : 1 (pid: 515592) error_file: /tmp/torchelastic_77vihpk7/none_rgdge1sh/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:34 host : jean-zay-iam09-ib0 rank : 58 (local_rank: 2) exitcode : 1 (pid: 2019774) error_file: /tmp/torchelastic_0b2jd8j2/none_dl0bj28o/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:34 host : jean-zay-iam38-ib0 rank : 202 (local_rank: 2) exitcode : 1 (pid: 3787117) error_file: /tmp/torchelastic_w18la3da/none_q3kjt4pv/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent elastic_launch( namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:34 host : jean-zay-iam39-ib0 rank : 212 (local_rank: 4) exitcode : 1 (pid: 1376653) args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam43-ib0 rank : 243 (local_rank: 3) exitcode : 1 (pid: 2984290) error_file: /tmp/torchelastic_lves7nz4/none_lj2ebc2n/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ error_file: /tmp/torchelastic_fztdq5ky/none_1h2z1ov6/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam27-ib0 rank : 124 (local_rank: 4) exitcode : 1 (pid: 251386) elastic_launch( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:34 host : jean-zay-iam34-ib0 rank : 168 (local_rank: 0) exitcode : 1 (pid: 1717395) error_file: /tmp/torchelastic_btkcerpw/none_a_71b3s7/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:34 host : jean-zay-iam35-ib0 rank : 178 (local_rank: 2) exitcode : 1 (pid: 1554903) error_file: /tmp/torchelastic_3d30mlq0/none_t64zxbli/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ raise ChildFailedError( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) error_file: /tmp/torchelastic_tcgfqc_9/none_2q_nrw35/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:34 host : jean-zay-iam14-ib0 rank : 81 (local_rank: 1) exitcode : 1 (pid: 2230721) error_file: /tmp/torchelastic_vehlqt3g/none_wi2nq8g1/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam32-ib0 rank : 155 (local_rank: 3) exitcode : 1 (pid: 515593) error_file: /tmp/torchelastic_77vihpk7/none_rgdge1sh/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:34 host : jean-zay-iam39-ib0 rank : 213 (local_rank: 5) exitcode : 1 (pid: 1376654) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:34 host : jean-zay-iam09-ib0 rank : 59 (local_rank: 3) exitcode : 1 (pid: 2019775) error_file: /tmp/torchelastic_0b2jd8j2/none_dl0bj28o/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:34 host : jean-zay-iam38-ib0 rank : 203 (local_rank: 3) exitcode : 1 (pid: 3787118) error_file: /tmp/torchelastic_w18la3da/none_q3kjt4pv/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam43-ib0 rank : 244 (local_rank: 4) exitcode : 1 (pid: 2984291) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:34 host : jean-zay-iam06-ib0 rank : 34 (local_rank: 2) exitcode : 1 (pid: 3634795) error_file: /tmp/torchelastic_nemme6rn/none_liu_6ujh/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic_fztdq5ky/none_1h2z1ov6/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam27-ib0 rank : 125 (local_rank: 5) exitcode : 1 (pid: 251387) return launch_agent(self._config, self._entrypoint, list(args)) action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main error_file: /tmp/torchelastic_lves7nz4/none_lj2ebc2n/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam26-ib0 rank : 113 (local_rank: 1) exitcode : 1 (pid: 422877) error_file: /tmp/torchelastic_v9i_ryr1/none_wd0q4izm/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables error_file: /tmp/torchelastic_tcgfqc_9/none_2q_nrw35/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:34 host : jean-zay-iam14-ib0 rank : 82 (local_rank: 2) exitcode : 1 (pid: 2230722) error_file: /tmp/torchelastic_vehlqt3g/none_wi2nq8g1/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:34 host : jean-zay-iam35-ib0 rank : 179 (local_rank: 3) exitcode : 1 (pid: 1554904) error_file: /tmp/torchelastic_3d30mlq0/none_t64zxbli/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam32-ib0 rank : 156 (local_rank: 4) exitcode : 1 (pid: 515594) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:34 host : jean-zay-iam09-ib0 rank : 60 (local_rank: 4) exitcode : 1 (pid: 2019776) namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:34 host : jean-zay-iam38-ib0 rank : 204 (local_rank: 4) exitcode : 1 (pid: 3787119) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam43-ib0 rank : 245 (local_rank: 5) exitcode : 1 (pid: 2984292) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action error_file: /tmp/torchelastic_77vihpk7/none_rgdge1sh/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:34 host : jean-zay-iam06-ib0 rank : 35 (local_rank: 3) exitcode : 1 (pid: 3634796) error_file: /tmp/torchelastic_nemme6rn/none_liu_6ujh/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:34 host : jean-zay-iam39-ib0 rank : 214 (local_rank: 6) exitcode : 1 (pid: 1376655) error_file: /tmp/torchelastic_fztdq5ky/none_1h2z1ov6/attempt_0/6/error.json traceback : Traceback (most recent call last): args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args error_file: /tmp/torchelastic_0b2jd8j2/none_dl0bj28o/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, error_file: /tmp/torchelastic_w18la3da/none_q3kjt4pv/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, error_file: /tmp/torchelastic_lves7nz4/none_lj2ebc2n/attempt_0/5/error.json traceback : Traceback (most recent call last): args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam26-ib0 rank : 114 (local_rank: 2) exitcode : 1 (pid: 422878) error_file: /tmp/torchelastic_v9i_ryr1/none_wd0q4izm/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam30-ib0 rank : 137 (local_rank: 1) exitcode : 1 (pid: 3595442) error_file: /tmp/torchelastic_40xv_mnq/none_7vu4axe_/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam27-ib0 rank : 126 (local_rank: 6) exitcode : 1 (pid: 251388) error_file: /tmp/torchelastic_tcgfqc_9/none_2q_nrw35/attempt_0/6/error.json traceback : Traceback (most recent call last): elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam37-ib0 rank : 193 (local_rank: 1) exitcode : 1 (pid: 3155664) error_file: /tmp/torchelastic_rdxggd3p/none_andj2wmx/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:34 host : jean-zay-iam35-ib0 rank : 180 (local_rank: 4) exitcode : 1 (pid: 1554905) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam32-ib0 rank : 157 (local_rank: 5) exitcode : 1 (pid: 515595) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam42-ib0 rank : 233 (local_rank: 1) exitcode : 1 (pid: 3043218) error_file: /tmp/torchelastic_wf9l0zby/none_16wf0h0p/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:34 host : jean-zay-iam14-ib0 rank : 83 (local_rank: 3) exitcode : 1 (pid: 2230723) error_file: /tmp/torchelastic_vehlqt3g/none_wi2nq8g1/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:34 host : jean-zay-iam09-ib0 rank : 61 (local_rank: 5) exitcode : 1 (pid: 2019777) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:34 host : jean-zay-iam38-ib0 rank : 205 (local_rank: 5) exitcode : 1 (pid: 3787120) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) error_file: /tmp/torchelastic_3d30mlq0/none_t64zxbli/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args error_file: /tmp/torchelastic_77vihpk7/none_rgdge1sh/attempt_0/5/error.json traceback : Traceback (most recent call last): args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args error_file: /tmp/torchelastic_0b2jd8j2/none_dl0bj28o/attempt_0/5/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_w18la3da/none_q3kjt4pv/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam43-ib0 rank : 246 (local_rank: 6) exitcode : 1 (pid: 2984293) error_file: /tmp/torchelastic_lves7nz4/none_lj2ebc2n/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam31-ib0 rank : 145 (local_rank: 1) exitcode : 1 (pid: 516470) error_file: /tmp/torchelastic_ooiooxf6/none_8lquc4pz/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:34 host : jean-zay-iam06-ib0 rank : 36 (local_rank: 4) exitcode : 1 (pid: 3634797) error_file: /tmp/torchelastic_nemme6rn/none_liu_6ujh/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:34 host : jean-zay-iam39-ib0 rank : 215 (local_rank: 7) exitcode : 1 (pid: 1376656) error_file: /tmp/torchelastic_fztdq5ky/none_1h2z1ov6/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:34 host : jean-zay-iam35-ib0 rank : 181 (local_rank: 5) exitcode : 1 (pid: 1554906) AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam26-ib0 rank : 115 (local_rank: 3) exitcode : 1 (pid: 422879) error_file: /tmp/torchelastic_v9i_ryr1/none_wd0q4izm/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam30-ib0 rank : 138 (local_rank: 2) exitcode : 1 (pid: 3595443) error_file: /tmp/torchelastic_40xv_mnq/none_7vu4axe_/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam27-ib0 rank : 127 (local_rank: 7) exitcode : 1 (pid: 251389) error_file: /tmp/torchelastic_tcgfqc_9/none_2q_nrw35/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam37-ib0 rank : 194 (local_rank: 2) exitcode : 1 (pid: 3155665) error_file: /tmp/torchelastic_rdxggd3p/none_andj2wmx/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:34 host : jean-zay-iam14-ib0 rank : 84 (local_rank: 4) exitcode : 1 (pid: 2230724) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args error_file: /tmp/torchelastic_3d30mlq0/none_t64zxbli/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam32-ib0 rank : 158 (local_rank: 6) exitcode : 1 (pid: 515596) error_file: /tmp/torchelastic_77vihpk7/none_rgdge1sh/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam42-ib0 rank : 234 (local_rank: 2) exitcode : 1 (pid: 3043219) error_file: /tmp/torchelastic_wf9l0zby/none_16wf0h0p/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, error_file: /tmp/torchelastic_vehlqt3g/none_wi2nq8g1/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:34 host : jean-zay-iam09-ib0 rank : 62 (local_rank: 6) exitcode : 1 (pid: 2019778) error_file: /tmp/torchelastic_0b2jd8j2/none_dl0bj28o/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:34 host : jean-zay-iam38-ib0 rank : 206 (local_rank: 6) exitcode : 1 (pid: 3787121) error_file: /tmp/torchelastic_w18la3da/none_q3kjt4pv/attempt_0/6/error.json traceback : Traceback (most recent call last): _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:34 host : jean-zay-iam06-ib0 rank : 37 (local_rank: 5) exitcode : 1 (pid: 3634798) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam43-ib0 rank : 247 (local_rank: 7) exitcode : 1 (pid: 2984294) error_file: /tmp/torchelastic_lves7nz4/none_lj2ebc2n/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam26-ib0 rank : 116 (local_rank: 4) exitcode : 1 (pid: 422880) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam31-ib0 rank : 146 (local_rank: 2) exitcode : 1 (pid: 516471) error_file: /tmp/torchelastic_ooiooxf6/none_8lquc4pz/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic_nemme6rn/none_liu_6ujh/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:34 host : jean-zay-iam39-ib0 rank : 208 (local_rank: 0) exitcode : 1 (pid: 1376649) error_file: /tmp/torchelastic_fztdq5ky/none_1h2z1ov6/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:34 host : jean-zay-iam14-ib0 rank : 85 (local_rank: 5) exitcode : 1 (pid: 2230725) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:34 host : jean-zay-iam35-ib0 rank : 182 (local_rank: 6) exitcode : 1 (pid: 1554907) error_file: /tmp/torchelastic_3d30mlq0/none_t64zxbli/attempt_0/6/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_v9i_ryr1/none_wd0q4izm/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam30-ib0 rank : 139 (local_rank: 3) exitcode : 1 (pid: 3595444) error_file: /tmp/torchelastic_40xv_mnq/none_7vu4axe_/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam27-ib0 rank : 120 (local_rank: 0) exitcode : 1 (pid: 251382) error_file: /tmp/torchelastic_tcgfqc_9/none_2q_nrw35/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam37-ib0 rank : 195 (local_rank: 3) exitcode : 1 (pid: 3155666) error_file: /tmp/torchelastic_rdxggd3p/none_andj2wmx/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, error_file: /tmp/torchelastic_vehlqt3g/none_wi2nq8g1/attempt_0/5/error.json traceback : Traceback (most recent call last): _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam32-ib0 rank : 159 (local_rank: 7) exitcode : 1 (pid: 515597) error_file: /tmp/torchelastic_77vihpk7/none_rgdge1sh/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:34 host : jean-zay-iam06-ib0 rank : 38 (local_rank: 6) exitcode : 1 (pid: 3634799) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam42-ib0 rank : 235 (local_rank: 3) exitcode : 1 (pid: 3043220) error_file: /tmp/torchelastic_wf9l0zby/none_16wf0h0p/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:34 host : jean-zay-iam09-ib0 rank : 63 (local_rank: 7) exitcode : 1 (pid: 2019779) error_file: /tmp/torchelastic_0b2jd8j2/none_dl0bj28o/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:34 host : jean-zay-iam38-ib0 rank : 207 (local_rank: 7) exitcode : 1 (pid: 3787122) error_file: /tmp/torchelastic_w18la3da/none_q3kjt4pv/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam26-ib0 rank : 117 (local_rank: 5) exitcode : 1 (pid: 422881) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args error_file: /tmp/torchelastic_nemme6rn/none_liu_6ujh/attempt_0/6/error.json traceback : Traceback (most recent call last): args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:34 host : jean-zay-iam15-ib0 rank : 89 (local_rank: 1) exitcode : 1 (pid: 2138193) error_file: /tmp/torchelastic_agwvxjuw/none_qzhowyey/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam43-ib0 rank : 240 (local_rank: 0) exitcode : 1 (pid: 2984287) error_file: /tmp/torchelastic_lves7nz4/none_lj2ebc2n/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) error_file: /tmp/torchelastic_v9i_ryr1/none_wd0q4izm/attempt_0/5/error.json traceback : Traceback (most recent call last): _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam30-ib0 rank : 140 (local_rank: 4) exitcode : 1 (pid: 3595445) action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam37-ib0 rank : 196 (local_rank: 4) exitcode : 1 (pid: 3155667) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:34 host : jean-zay-iam14-ib0 rank : 86 (local_rank: 6) exitcode : 1 (pid: 2230726) error_file: /tmp/torchelastic_vehlqt3g/none_wi2nq8g1/attempt_0/6/error.json traceback : Traceback (most recent call last): _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:34 host : jean-zay-iam35-ib0 rank : 183 (local_rank: 7) exitcode : 1 (pid: 1554908) error_file: /tmp/torchelastic_3d30mlq0/none_t64zxbli/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam31-ib0 rank : 147 (local_rank: 3) exitcode : 1 (pid: 516472) error_file: /tmp/torchelastic_ooiooxf6/none_8lquc4pz/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) error_file: /tmp/torchelastic_40xv_mnq/none_7vu4axe_/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam42-ib0 rank : 236 (local_rank: 4) exitcode : 1 (pid: 3043221) error_file: /tmp/torchelastic_rdxggd3p/none_andj2wmx/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam32-ib0 rank : 152 (local_rank: 0) exitcode : 1 (pid: 515590) error_file: /tmp/torchelastic_77vihpk7/none_rgdge1sh/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:34 host : jean-zay-iam06-ib0 rank : 32 (local_rank: 0) exitcode : 1 (pid: 3634793) error_file: /tmp/torchelastic_nemme6rn/none_liu_6ujh/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) error_file: /tmp/torchelastic_wf9l0zby/none_16wf0h0p/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:34 host : jean-zay-iam09-ib0 rank : 56 (local_rank: 0) exitcode : 1 (pid: 2019772) error_file: /tmp/torchelastic_0b2jd8j2/none_dl0bj28o/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:34 host : jean-zay-iam38-ib0 rank : 200 (local_rank: 0) exitcode : 1 (pid: 3787115) error_file: /tmp/torchelastic_w18la3da/none_q3kjt4pv/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam26-ib0 rank : 118 (local_rank: 6) exitcode : 1 (pid: 422882) error_file: /tmp/torchelastic_v9i_ryr1/none_wd0q4izm/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam30-ib0 rank : 141 (local_rank: 5) exitcode : 1 (pid: 3595446) action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:34 host : jean-zay-iam15-ib0 rank : 90 (local_rank: 2) exitcode : 1 (pid: 2138194) error_file: /tmp/torchelastic_agwvxjuw/none_qzhowyey/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam37-ib0 rank : 197 (local_rank: 5) exitcode : 1 (pid: 3155668) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam31-ib0 rank : 148 (local_rank: 4) exitcode : 1 (pid: 516473) args = _parse_args(extra_args_provider=extra_args_provider, error_file: /tmp/torchelastic_40xv_mnq/none_7vu4axe_/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam42-ib0 rank : 237 (local_rank: 5) exitcode : 1 (pid: 3043222) error_file: /tmp/torchelastic_rdxggd3p/none_andj2wmx/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:34 host : jean-zay-iam14-ib0 rank : 87 (local_rank: 7) exitcode : 1 (pid: 2230727) error_file: /tmp/torchelastic_vehlqt3g/none_wi2nq8g1/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:34 host : jean-zay-iam35-ib0 rank : 176 (local_rank: 0) exitcode : 1 (pid: 1554901) error_file: /tmp/torchelastic_3d30mlq0/none_t64zxbli/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ error_file: /tmp/torchelastic_ooiooxf6/none_8lquc4pz/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args error_file: /tmp/torchelastic_wf9l0zby/none_16wf0h0p/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam26-ib0 rank : 119 (local_rank: 7) exitcode : 1 (pid: 422883) error_file: /tmp/torchelastic_v9i_ryr1/none_wd0q4izm/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam31-ib0 rank : 149 (local_rank: 5) exitcode : 1 (pid: 516474) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam30-ib0 rank : 142 (local_rank: 6) exitcode : 1 (pid: 3595447) error_file: /tmp/torchelastic_40xv_mnq/none_7vu4axe_/attempt_0/6/error.json traceback : Traceback (most recent call last): AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:34 host : jean-zay-iam15-ib0 rank : 91 (local_rank: 3) exitcode : 1 (pid: 2138195) error_file: /tmp/torchelastic_agwvxjuw/none_qzhowyey/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam37-ib0 rank : 198 (local_rank: 6) exitcode : 1 (pid: 3155669) error_file: /tmp/torchelastic_rdxggd3p/none_andj2wmx/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args error_file: /tmp/torchelastic_ooiooxf6/none_8lquc4pz/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam42-ib0 rank : 238 (local_rank: 6) exitcode : 1 (pid: 3043223) error_file: /tmp/torchelastic_wf9l0zby/none_16wf0h0p/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:34 host : jean-zay-iam14-ib0 rank : 80 (local_rank: 0) exitcode : 1 (pid: 2230720) error_file: /tmp/torchelastic_vehlqt3g/none_wi2nq8g1/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:34 host : jean-zay-iam15-ib0 rank : 92 (local_rank: 4) exitcode : 1 (pid: 2138196) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam26-ib0 rank : 112 (local_rank: 0) exitcode : 1 (pid: 422876) error_file: /tmp/torchelastic_v9i_ryr1/none_wd0q4izm/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam31-ib0 rank : 150 (local_rank: 6) exitcode : 1 (pid: 516475) error_file: /tmp/torchelastic_ooiooxf6/none_8lquc4pz/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam30-ib0 rank : 143 (local_rank: 7) exitcode : 1 (pid: 3595448) error_file: /tmp/torchelastic_40xv_mnq/none_7vu4axe_/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) error_file: /tmp/torchelastic_agwvxjuw/none_qzhowyey/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam37-ib0 rank : 199 (local_rank: 7) exitcode : 1 (pid: 3155670) error_file: /tmp/torchelastic_rdxggd3p/none_andj2wmx/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam42-ib0 rank : 239 (local_rank: 7) exitcode : 1 (pid: 3043224) error_file: /tmp/torchelastic_wf9l0zby/none_16wf0h0p/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:34 host : jean-zay-iam15-ib0 rank : 93 (local_rank: 5) exitcode : 1 (pid: 2138197) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) error_file: /tmp/torchelastic_agwvxjuw/none_qzhowyey/attempt_0/5/error.json traceback : Traceback (most recent call last): _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam30-ib0 rank : 136 (local_rank: 0) exitcode : 1 (pid: 3595441) error_file: /tmp/torchelastic_40xv_mnq/none_7vu4axe_/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam37-ib0 rank : 192 (local_rank: 0) exitcode : 1 (pid: 3155663) error_file: /tmp/torchelastic_rdxggd3p/none_andj2wmx/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam31-ib0 rank : 151 (local_rank: 7) exitcode : 1 (pid: 516476) error_file: /tmp/torchelastic_ooiooxf6/none_8lquc4pz/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam42-ib0 rank : 232 (local_rank: 0) exitcode : 1 (pid: 3043217) error_file: /tmp/torchelastic_wf9l0zby/none_16wf0h0p/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:34 host : jean-zay-iam15-ib0 rank : 94 (local_rank: 6) exitcode : 1 (pid: 2138198) error_file: /tmp/torchelastic_agwvxjuw/none_qzhowyey/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam31-ib0 rank : 144 (local_rank: 0) exitcode : 1 (pid: 516469) error_file: /tmp/torchelastic_ooiooxf6/none_8lquc4pz/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:34 host : jean-zay-iam15-ib0 rank : 95 (local_rank: 7) exitcode : 1 (pid: 2138199) error_file: /tmp/torchelastic_agwvxjuw/none_qzhowyey/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:34 host : jean-zay-iam15-ib0 rank : 88 (local_rank: 0) exitcode : 1 (pid: 2138192) error_file: /tmp/torchelastic_agwvxjuw/none_qzhowyey/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:34 host : jean-zay-iam46-ib0 rank : 266 (local_rank: 2) exitcode : 1 (pid: 3917200) error_file: /tmp/torchelastic_130ddss9/none_zy1lm3yo/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:34 host : jean-zay-iam46-ib0 rank : 267 (local_rank: 3) exitcode : 1 (pid: 3917201) error_file: /tmp/torchelastic_130ddss9/none_zy1lm3yo/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:34 host : jean-zay-iam46-ib0 rank : 268 (local_rank: 4) exitcode : 1 (pid: 3917202) error_file: /tmp/torchelastic_130ddss9/none_zy1lm3yo/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:34 host : jean-zay-iam46-ib0 rank : 269 (local_rank: 5) exitcode : 1 (pid: 3917203) error_file: /tmp/torchelastic_130ddss9/none_zy1lm3yo/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:34 host : jean-zay-iam46-ib0 rank : 270 (local_rank: 6) exitcode : 1 (pid: 3917204) raise ChildFailedError( error_file: /tmp/torchelastic_130ddss9/none_zy1lm3yo/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:34 host : jean-zay-iam46-ib0 rank : 264 (local_rank: 0) exitcode : 1 (pid: 3917198) error_file: /tmp/torchelastic_130ddss9/none_zy1lm3yo/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-03_19:31:35 host : jean-zay-iam18-ib0 rank : 97 (local_rank: 1) exitcode : 1 (pid: 2640257) error_file: /tmp/torchelastic_doahr19s/none_27vylz0m/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [2]: time : 2022-09-03_19:31:35 host : jean-zay-iam18-ib0 rank : 98 (local_rank: 2) exitcode : 1 (pid: 2640258) error_file: /tmp/torchelastic_doahr19s/none_27vylz0m/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [3]: time : 2022-09-03_19:31:35 host : jean-zay-iam18-ib0 rank : 99 (local_rank: 3) exitcode : 1 (pid: 2640259) error_file: /tmp/torchelastic_doahr19s/none_27vylz0m/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [4]: time : 2022-09-03_19:31:35 host : jean-zay-iam18-ib0 rank : 100 (local_rank: 4) exitcode : 1 (pid: 2640260) error_file: /tmp/torchelastic_doahr19s/none_27vylz0m/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [5]: time : 2022-09-03_19:31:35 host : jean-zay-iam18-ib0 rank : 101 (local_rank: 5) exitcode : 1 (pid: 2640261) error_file: /tmp/torchelastic_doahr19s/none_27vylz0m/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [6]: time : 2022-09-03_19:31:35 host : jean-zay-iam18-ib0 rank : 102 (local_rank: 6) exitcode : 1 (pid: 2640262) error_file: /tmp/torchelastic_doahr19s/none_27vylz0m/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam18-ib0 rank : 103 (local_rank: 7) exitcode : 1 (pid: 2640263) error_file: /tmp/torchelastic_doahr19s/none_27vylz0m/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam18-ib0 rank : 96 (local_rank: 0) exitcode : 1 (pid: 2640256) error_file: /tmp/torchelastic_doahr19s/none_27vylz0m/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected [7]: time : 2022-09-03_19:31:35 host : jean-zay-iam02-ib0 rank : 7 (local_rank: 7) exitcode : 1 (pid: 3638447) error_file: /tmp/torchelastic__h_i6_s2/none_usa8vt7d/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-03_19:31:35 host : jean-zay-iam02-ib0 rank : 0 (local_rank: 0) exitcode : 1 (pid: 3638440) error_file: /tmp/torchelastic__h_i6_s2/none_usa8vt7d/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 99, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/initialize.py", line 87, in initialize_megatron set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 90, in set_global_variables args = _parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/global_vars.py", line 107, in _parse_args _GLOBAL_ARGS = parse_args(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 67, in parse_args args = parser.parse_args() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1768, in parse_args args, argv = self.parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1800, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 2006, in _parse_known_args start_index = consume_optional(start_index) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1946, in consume_optional take_action(action, args, option_string) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/argparse.py", line 1874, in take_action action(self, namespace, argument_values, option_string) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/arguments.py", line 867, in __call__ assert len(lines) == 1, f"Got multiple lines {len(lines)} instead of 1 expected" AssertionError: Got multiple lines 4 instead of 1 expected ============================================================ srun: error: jean-zay-iam32: task 19: Exited with exit code 1 srun: launch/slurm: _step_signal: Terminating StepId=927342.0 srun: error: jean-zay-iam18: task 12: Exited with exit code 1 srun: error: jean-zay-iam30: task 17: Exited with exit code 1 srun: error: jean-zay-iam44: task 31: Exited with exit code 1 srun: error: jean-zay-iam28: task 16: Exited with exit code 1 srun: error: jean-zay-iam26: task 14: Exited with exit code 1 srun: error: jean-zay-iam07: task 5: Exited with exit code 1 srun: error: jean-zay-iam46: task 33: Exited with exit code 1 srun: error: jean-zay-iam09: task 7: Exited with exit code 1 srun: error: jean-zay-iam08: task 6: Exited with exit code 1 srun: error: jean-zay-iam03: task 1: Exited with exit code 1 srun: error: jean-zay-iam34: task 21: Exited with exit code 1 srun: error: jean-zay-iam45: task 32: Exited with exit code 1 srun: error: jean-zay-iam38: task 25: Exited with exit code 1 srun: error: jean-zay-iam11: task 8: Exited with exit code 1 srun: error: jean-zay-iam39: task 26: Exited with exit code 1 srun: error: jean-zay-iam40: task 27: Exited with exit code 1 srun: error: jean-zay-iam13: task 9: Exited with exit code 1 srun: error: jean-zay-iam04: task 2: Exited with exit code 1 srun: error: jean-zay-iam35: task 22: Exited with exit code 1 srun: error: jean-zay-iam05: task 3: Exited with exit code 1 srun: error: jean-zay-iam36: task 23: Exited with exit code 1 srun: error: jean-zay-iam31: task 18: Exited with exit code 1 srun: error: jean-zay-iam33: task 20: Exited with exit code 1 srun: error: jean-zay-iam06: task 4: Exited with exit code 1 srun: error: jean-zay-iam41: task 28: Exited with exit code 1 srun: error: jean-zay-iam47: task 34: Exited with exit code 1 srun: error: jean-zay-iam27: task 15: Exited with exit code 1 srun: error: jean-zay-iam37: task 24: Exited with exit code 1 srun: error: jean-zay-iam52: task 35: Exited with exit code 1 srun: error: jean-zay-iam15: task 11: Exited with exit code 1 srun: error: jean-zay-iam19: task 13: Exited with exit code 1 srun: error: jean-zay-iam02: task 0: Exited with exit code 1 srun: error: jean-zay-iam43: task 30: Exited with exit code 1 srun: error: jean-zay-iam42: task 29: Exited with exit code 1 srun: error: jean-zay-iam14: task 10: Exited with exit code 1 WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** [default0]:using world size: 288, data-parallel-size: 4, tensor-model-parallel size: 1, pipeline-model-parallel size: 72 [default0]:accumulate and all-reduce gradients in fp32 for bfloat16 data type. [default0]:using torch.bfloat16 for parameters ... [default0]:------------------------ arguments ------------------------ [default0]: abort_on_unmet_fused_kernel_constraints ......... True [default0]: accumulate_allreduce_grads_in_fp32 .............. True [default0]: adam_beta1 ...................................... 0.9 [default0]: adam_beta2 ...................................... 0.95 [default0]: adam_eps ........................................ 1e-08 [default0]: adlr_autoresume ................................. False [default0]: adlr_autoresume_interval ........................ 1000 [default0]: apply_query_key_layer_scaling ................... True [default0]: apply_residual_connection_post_layernorm ........ False [default0]: attention_dropout ............................... 0.1 [default0]: attention_softmax_in_fp32 ....................... False [default0]: bert_binary_head ................................ True [default0]: bert_load ....................................... None [default0]: bf16 ............................................ True [default0]: bias_dropout_fusion ............................. True [default0]: bias_gelu_fusion ................................ True [default0]: biencoder_projection_dim ........................ 0 [default0]: biencoder_shared_query_context_model ............ False [default0]: block_data_path ................................. None [default0]: checkpoint_activations .......................... True [default0]: checkpoint_in_cpu ............................... False [default0]: checkpoint_num_layers ........................... 1 [default0]: clip_grad ....................................... 1.0 [default0]: codecarbon_dir .................................. None [default0]: consumed_train_samples .......................... 0 [default0]: consumed_train_tokens ........................... 0 [default0]: consumed_valid_samples .......................... 0 [default0]: contigious_checkpointing ........................ False [default0]: cpu_optimizer ................................... False [default0]: cpu_torch_adam .................................. False [default0]: curriculum_learning ............................. False [default0]: data_impl ....................................... mmap [default0]: data_parallel_size .............................. 4 [default0]: data_path ....................................... None [default0]: dataloader_type ................................. single [default0]: DDP_impl ........................................ local [default0]: decoder_seq_length .............................. None [default0]: deepscale ....................................... False [default0]: deepscale_config ................................ None [default0]: deepspeed ....................................... True [default0]: deepspeed_activation_checkpointing .............. True [default0]: deepspeed_config ................................ ./ds_config.927375.json [default0]: deepspeed_mpi ................................... False [default0]: distribute_checkpointed_activations ............. False [default0]: distributed_backend ............................. nccl [default0]: embed_layernorm ................................. True [default0]: embedding_path .................................. None [default0]: encoder_seq_length .............................. 2048 [default0]: eod_mask_loss ................................... False [default0]: eval_interval ................................... 250 [default0]: eval_iters ...................................... 1 [default0]: eval_only ....................................... None [default0]: evidence_data_path .............................. None [default0]: exit_duration_in_mins ........................... 5990 [default0]: exit_interval ................................... None [default0]: ffn_hidden_size ................................. 57344 [default0]: finetune ........................................ False [default0]: fp16 ............................................ False [default0]: fp16_lm_cross_entropy ........................... False [default0]: fp32_residual_connection ........................ False [default0]: gigaflos_no_embeds .............................. 0 [default0]: global_batch_size ............................... 2048 [default0]: glu_activation .................................. None [default0]: hidden_dropout .................................. 0.1 [default0]: hidden_size ..................................... 14336 [default0]: hysteresis ...................................... 2 [default0]: ict_head_size ................................... None [default0]: ict_load ........................................ None [default0]: img_dim ......................................... 224 [default0]: indexer_batch_size .............................. 128 [default0]: indexer_log_interval ............................ 1000 [default0]: inference ....................................... False [default0]: init_method_std ................................. 0.0048 [default0]: init_method_xavier_uniform ...................... False [default0]: initial_loss_scale .............................. 4294967296 [default0]: kill_switch_path ................................ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/kill-switch-tr13-176B-mtf [default0]: kv_channels ..................................... 128 [default0]: layernorm_epsilon ............................... 1e-05 [default0]: lazy_mpu_init ................................... None [default0]: load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: local_rank ...................................... None [default0]: log_batch_size_to_tensorboard ................... True [default0]: log_interval .................................... 1 [default0]: log_learning_rate_to_tensorboard ................ True [default0]: log_level ....................................... None [default0]: log_level_replica ............................... None [default0]: log_loss_scale_to_tensorboard ................... True [default0]: log_num_zeros_in_grad ........................... False [default0]: log_params_norm ................................. False [default0]: log_path ........................................ None [default0]: log_timers_to_tensorboard ....................... True [default0]: log_validation_ppl_to_tensorboard ............... True [default0]: loss_on_targets_only ............................ False [default0]: loss_scale ...................................... None [default0]: loss_scale_window ............................... 1000 [default0]: lr .............................................. 2e-05 [default0]: lr_decay_iters .................................. None [default0]: lr_decay_samples ................................ None [default0]: lr_decay_style .................................. constant [default0]: lr_decay_tokens ................................. None [default0]: lr_warmup_fraction .............................. None [default0]: lr_warmup_iters ................................. 0 [default0]: lr_warmup_samples ............................... 0 [default0]: make_vocab_size_divisible_by .................... 128 [default0]: mask_prob ....................................... 0.15 [default0]: masked_softmax_fusion ........................... True [default0]: max_position_embeddings ......................... 2048 [default0]: mean_noise_span_length .......................... None [default0]: memory_centric_tiled_linear ..................... False [default0]: merge_file ...................................... None [default0]: micro_batch_size ................................ 1 [default0]: min_loss_scale .................................. 1.0 [default0]: min_lr .......................................... 0.0 [default0]: mmap_warmup ..................................... False [default0]: no_load_optim ................................... True [default0]: no_load_rng ..................................... None [default0]: no_save_optim ................................... None [default0]: no_save_rng ..................................... None [default0]: noise_density ................................... None [default0]: norm_target_loss ................................ True [default0]: num_attention_heads ............................. 112 [default0]: num_channels .................................... 3 [default0]: num_classes ..................................... 1000 [default0]: num_layers ...................................... 70 [default0]: num_layers_per_virtual_pipeline_stage ........... None [default0]: num_workers ..................................... 2 [default0]: onnx_safe ....................................... None [default0]: openai_gelu ..................................... False [default0]: optimizer ....................................... adam [default0]: override_lr_scheduler ........................... False [default0]: pad_vocab_size_to ............................... 250880 [default0]: params_dtype .................................... torch.bfloat16 [default0]: partition_activations ........................... False [default0]: patch_dim ....................................... 16 [default0]: pipeline_model_parallel_size .................... 72 [default0]: position_embedding_type ......................... PositionEmbeddingType.alibi [default0]: pp_partition_method ............................. type:transformer|embedding [default0]: prefixlm ........................................ False [default0]: profile_backward ................................ False [default0]: query_in_block_prob ............................. 0.1 [default0]: rampup_batch_size ............................... None [default0]: rank ............................................ 0 [default0]: remote_device ................................... none [default0]: reset_attention_mask ............................ False [default0]: reset_position_ids .............................. False [default0]: reset_progress .................................. True [default0]: retriever_report_topk_accuracies ................ [] [default0]: retriever_score_scaling ......................... False [default0]: retriever_seq_length ............................ 256 [default0]: reweight_loss_based_on_position_frequency ....... False [default0]: sample_rate ..................................... 1.0 [default0]: save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: save_interval ................................... 5 [default0]: scatter_gather_tensors_in_pipeline .............. True [default0]: scattered_embeddings ............................ False [default0]: seed ............................................ 42 [default0]: seq_length ...................................... 2048 [default0]: sgd_momentum .................................... 0.9 [default0]: short_seq_prob .................................. 0.1 [default0]: skip_train_iteration_range ...................... None [default0]: split ........................................... None [default0]: split_transformers .............................. False [default0]: sync_tp_duplicated_parameters ................... True [default0]: synchronize_each_layer .......................... False [default0]: tensor_model_parallel_size ...................... 1 [default0]: tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/tr13-176B-ml-t0-logs/tensorboard/p31lossseq [default0]: tensorboard_log_interval ........................ 1 [default0]: tensorboard_queue_size .......................... 5 [default0]: test_weighted_split_paths ....................... None [default0]: test_weighted_split_paths_path .................. None [default0]: tile_factor ..................................... 1 [default0]: titles_data_path ................................ None [default0]: tokenizer_name_or_path .......................... bigscience/tokenizer [default0]: tokenizer_type .................................. PretrainedFromHF [default0]: train_iters ..................................... None [default0]: train_samples ................................... 6348800 [default0]: train_tokens .................................... None [default0]: train_weighted_split_names ...................... ['train'] [default0]: train_weighted_split_paths ...................... [['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train']] [default0]: train_weighted_split_paths_path ................. None [default0]: train_weighted_split_splits ..................... [['0:1']] [default0]: train_weighted_split_weights .................... [['1']] [default0]: universal_checkpoint ............................ True [default0]: use_bnb_optimizer ............................... False [default0]: use_checkpoint_lr_scheduler ..................... False [default0]: use_contiguous_buffers_in_ddp ................... True [default0]: use_cpu_initialization .......................... None [default0]: use_one_sent_docs ............................... False [default0]: use_pin_memory .................................. False [default0]: valid_num_workers ............................... 2 [default0]: valid_weighted_split_names ...................... ['validation_pretraining', 'valid'] [default0]: valid_weighted_split_paths ...................... [['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation']] [default0]: valid_weighted_split_paths_path ................. None [default0]: valid_weighted_split_splits ..................... [['0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0'], ['0:1']] [default0]: valid_weighted_split_weights .................... [['0.0330676168743166', '0.011242051312222764', '0.13027200903379185', '0.22171164529099704', '0.10667815627928671', '0.0015595123898173287', '0.13054018439603915', '0.01091803753667153', '0.00011021422347108609', '0.005492381453597748', '0.0004021215011318779', '0.007470068593492175', '0.0006190467776576425', '0.0010335296343329384', '0.0005012010684646179', '0.0006672772956128299', '0.00035928138344705506', '0.0005084433130291778', '0.0021137328219915496', '0.0009129946225980253', '0.0012454301613725426', '0.00031588689199263235', '0.08137213783015229', '0.055293935695898196', '0.04954150576361177', '0.02461641286531197', '0.12091748245519074', '0.0005177025345001541'], ['1']] [default0]: virtual_pipeline_model_parallel_size ............ None [default0]: vocab_extra_ids ................................. 0 [default0]: vocab_file ...................................... None [default0]: weight_decay .................................... 0.0001 [default0]: world_size ...................................... 288 [default0]: zero_allgather_bucket_size ...................... 0.0 [default0]: zero_contigious_gradients ....................... False [default0]: zero_reduce_bucket_size ......................... 0.0 [default0]: zero_reduce_scatter ............................. False [default0]: zero_stage ...................................... 0 [default0]:-------------------- end of arguments --------------------- [default0]:setting number of micro-batches to constant 512 [default0]:> building PretrainedFromHF tokenizer ... [default0]: vocab file is un-used. loading tokenizer from pre-trained model [default0]:Offline mode: forcing local_files_only=True [default0]:Offline mode: forcing local_files_only=True [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer.json from cache at /gpfswork/rech/six/commun/models/29d0a41f4527257b8afe6d5495f492dac260318430f18239a42ca5f6dc4487fc.7b0fb8edc2986944ff9b7418149b52d8c4a1354a17d0360deb8974da70c6cc03 [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/added_tokens.json from cache at None [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/special_tokens_map.json from cache at /gpfswork/rech/six/commun/models/4f03e43bcc54e0721823e6a06b1d197905e2ea79aa7dcc1a0f0fcecc73ce3fb2.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer_config.json from cache at /gpfswork/rech/six/commun/models/9441c67b923ef7a65950a64e31c40f80ed181ba59502981a80f2cd0c438c6432.3c09887250243e50d8de9d10b2a778152434f62a22a95b5f89dbbe79a6eb496a [default7]:> setting tensorboard ... [default0]: > padded vocab (size: 250680) with 200 dummy tokens (new size: 250880) [default0]:DeepSpeed general environment info: [default0]:torch install path ............... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch'] [default0]:torch version .................... 1.12.0 [default0]:torch cuda version ............... 11.3 [default0]:torch hip version ................ None [default0]:nvcc version ..................... 11.4 [default0]:deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed'] [default0]:deepspeed info ................... 0.7.1+8b2a6371, 8b2a6371, master [default0]:deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3 [default0]:**** Git info for Megatron: git_hash=6c1018f git_branch=mtf-multival **** [default0]:> initializing torch distributed ... [default0]:[2022-09-03 19:40:13,794] [INFO] [comm.py:628:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [default0]:> initializing tensor model parallel with size 1 [default0]:> initializing pipeline model parallel with size 72 [default0]:> setting random seeds to 42 ... [default0]:[2022-09-03 19:40:23,382] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 [default0]:> compiling dataset index builder ... [default0]:make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:make: Nothing to be done for 'default'. [default0]:make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:>>> done with dataset index builder. Compilation time: 0.097 seconds [default0]:> compiling and loading fused kernels ... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module fused_mix_prec_layer_norm_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module fused_mix_prec_layer_norm_cuda... [default0]:>>> done with compiling and loading fused kernels. Compilation time: 6.724 seconds [default0]:time to initialize megatron (seconds): 14.205 [default0]:[after megatron is initialized] datetime: 2022-09-03 19:40:30 [default0]:building GPT model ... [default0]:[2022-09-03 19:40:30,246] [INFO] [utils.py:827:see_memory_usage] Before Building Model [default0]:[2022-09-03 19:40:30,246] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [default0]:[2022-09-03 19:40:30,246] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.07 GB, percent = 7.2% [default0]:SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None [default0]:Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=1, model=0): 5, ProcessCoord(pipe=1, data=2, model=0): 6, ProcessCoord(pipe=1, data=3, model=0): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=1, model=0): 9, ProcessCoord(pipe=2, data=2, model=0): 10, ProcessCoord(pipe=2, data=3, model=0): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=1, model=0): 13, ProcessCoord(pipe=3, data=2, model=0): 14, ProcessCoord(pipe=3, data=3, model=0): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=1, model=0): 17, ProcessCoord(pipe=4, data=2, model=0): 18, ProcessCoord(pipe=4, data=3, model=0): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=1, model=0): 21, ProcessCoord(pipe=5, data=2, model=0): 22, ProcessCoord(pipe=5, data=3, model=0): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=1, model=0): 25, ProcessCoord(pipe=6, data=2, model=0): 26, ProcessCoord(pipe=6, data=3, model=0): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=1, model=0): 29, ProcessCoord(pipe=7, data=2, model=0): 30, ProcessCoord(pipe=7, data=3, model=0): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=1, model=0): 33, ProcessCoord(pipe=8, data=2, model=0): 34, ProcessCoord(pipe=8, data=3, model=0): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=1, model=0): 37, ProcessCoord(pipe=9, data=2, model=0): 38, ProcessCoord(pipe=9, data=3, model=0): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=1, model=0): 41, ProcessCoord(pipe=10, data=2, model=0): 42, ProcessCoord(pipe=10, data=3, model=0): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=1, model=0): 45, ProcessCoord(pipe=11, data=2, model=0): 46, ProcessCoord(pipe=11, data=3, model=0): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=1, model=0): 49, ProcessCoord(pipe=12, data=2, model=0): 50, ProcessCoord(pipe=12, data=3, model=0): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=1, model=0): 53, ProcessCoord(pipe=13, data=2, model=0): 54, ProcessCoord(pipe=13, data=3, model=0): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=1, model=0): 57, ProcessCoord(pipe=14, data=2, model=0): 58, ProcessCoord(pipe=14, data=3, model=0): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=1, model=0): 61, ProcessCoord(pipe=15, data=2, model=0): 62, ProcessCoord(pipe=15, data=3, model=0): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=1, model=0): 65, ProcessCoord(pipe=16, data=2, model=0): 66, ProcessCoord(pipe=16, data=3, model=0): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=1, model=0): 69, ProcessCoord(pipe=17, data=2, model=0): 70, ProcessCoord(pipe=17, data=3, model=0): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=1, model=0): 73, ProcessCoord(pipe=18, data=2, model=0): 74, ProcessCoord(pipe=18, data=3, model=0): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=1, model=0): 77, ProcessCoord(pipe=19, data=2, model=0): 78, ProcessCoord(pipe=19, data=3, model=0): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=1, model=0): 81, ProcessCoord(pipe=20, data=2, model=0): 82, ProcessCoord(pipe=20, data=3, model=0): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=1, model=0): 85, ProcessCoord(pipe=21, data=2, model=0): 86, ProcessCoord(pipe=21, data=3, model=0): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=1, model=0): 89, ProcessCoord(pipe=22, data=2, model=0): 90, ProcessCoord(pipe=22, data=3, model=0): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=1, model=0): 93, ProcessCoord(pipe=23, data=2, model=0): 94, ProcessCoord(pipe=23, data=3, model=0): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=1, model=0): 97, ProcessCoord(pipe=24, data=2, model=0): 98, ProcessCoord(pipe=24, data=3, model=0): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=1, model=0): 101, ProcessCoord(pipe=25, data=2, model=0): 102, ProcessCoord(pipe=25, data=3, model=0): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=1, model=0): 105, ProcessCoord(pipe=26, data=2, model=0): 106, ProcessCoord(pipe=26, data=3, model=0): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=1, model=0): 109, ProcessCoord(pipe=27, data=2, model=0): 110, ProcessCoord(pipe=27, data=3, model=0): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=1, model=0): 113, ProcessCoord(pipe=28, data=2, model=0): 114, ProcessCoord(pipe=28, data=3, model=0): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=1, model=0): 117, ProcessCoord(pipe=29, data=2, model=0): 118, ProcessCoord(pipe=29, data=3, model=0): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=1, model=0): 121, ProcessCoord(pipe=30, data=2, model=0): 122, ProcessCoord(pipe=30, data=3, model=0): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=1, model=0): 125, ProcessCoord(pipe=31, data=2, model=0): 126, ProcessCoord(pipe=31, data=3, model=0): 127, ProcessCoord(pipe=32, data=0, model=0): 128, ProcessCoord(pipe=32, data=1, model=0): 129, ProcessCoord(pipe=32, data=2, model=0): 130, ProcessCoord(pipe=32, data=3, model=0): 131, ProcessCoord(pipe=33, data=0, model=0): 132, ProcessCoord(pipe=33, data=1, model=0): 133, ProcessCoord(pipe=33, data=2, model=0): 134, ProcessCoord(pipe=33, data=3, model=0): 135, ProcessCoord(pipe=34, data=0, model=0): 136, ProcessCoord(pipe=34, data=1, model=0): 137, ProcessCoord(pipe=34, data=2, model=0): 138, ProcessCoord(pipe=34, data=3, model=0): 139, ProcessCoord(pipe=35, data=0, model=0): 140, ProcessCoord(pipe=35, data=1, model=0): 141, ProcessCoord(pipe=35, data=2, model=0): 142, ProcessCoord(pipe=35, data=3, model=0): 143, ProcessCoord(pipe=36, data=0, model=0): 144, ProcessCoord(pipe=36, data=1, model=0): 145, ProcessCoord(pipe=36, data=2, model=0): 146, ProcessCoord(pipe=36, data=3, model=0): 147, ProcessCoord(pipe=37, data=0, model=0): 148, ProcessCoord(pipe=37, data=1, model=0): 149, ProcessCoord(pipe=37, data=2, model=0): 150, ProcessCoord(pipe=37, data=3, model=0): 151, ProcessCoord(pipe=38, data=0, model=0): 152, ProcessCoord(pipe=38, data=1, model=0): 153, ProcessCoord(pipe=38, data=2, model=0): 154, ProcessCoord(pipe=38, data=3, model=0): 155, ProcessCoord(pipe=39, data=0, model=0): 156, ProcessCoord(pipe=39, data=1, model=0): 157, ProcessCoord(pipe=39, data=2, model=0): 158, ProcessCoord(pipe=39, data=3, model=0): 159, ProcessCoord(pipe=40, data=0, model=0): 160, ProcessCoord(pipe=40, data=1, model=0): 161, ProcessCoord(pipe=40, data=2, model=0): 162, ProcessCoord(pipe=40, data=3, model=0): 163, ProcessCoord(pipe=41, data=0, model=0): 164, ProcessCoord(pipe=41, data=1, model=0): 165, ProcessCoord(pipe=41, data=2, model=0): 166, ProcessCoord(pipe=41, data=3, model=0): 167, ProcessCoord(pipe=42, data=0, model=0): 168, ProcessCoord(pipe=42, data=1, model=0): 169, ProcessCoord(pipe=42, data=2, model=0): 170, ProcessCoord(pipe=42, data=3, model=0): 171, ProcessCoord(pipe=43, data=0, model=0): 172, ProcessCoord(pipe=43, data=1, model=0): 173, ProcessCoord(pipe=43, data=2, model=0): 174, ProcessCoord(pipe=43, data=3, model=0): 175, ProcessCoord(pipe=44, data=0, model=0): 176, ProcessCoord(pipe=44, data=1, model=0): 177, ProcessCoord(pipe=44, data=2, model=0): 178, ProcessCoord(pipe=44, data=3, model=0): 179, ProcessCoord(pipe=45, data=0, model=0): 180, ProcessCoord(pipe=45, data=1, model=0): 181, ProcessCoord(pipe=45, data=2, model=0): 182, ProcessCoord(pipe=45, data=3, model=0): 183, ProcessCoord(pipe=46, data=0, model=0): 184, ProcessCoord(pipe=46, data=1, model=0): 185, ProcessCoord(pipe=46, data=2, model=0): 186, ProcessCoord(pipe=46, data=3, model=0): 187, ProcessCoord(pipe=47, data=0, model=0): 188, ProcessCoord(pipe=47, data=1, model=0): 189, ProcessCoord(pipe=47, data=2, model=0): 190, ProcessCoord(pipe=47, data=3, model=0): 191, ProcessCoord(pipe=48, data=0, model=0): 192, ProcessCoord(pipe=48, data=1, model=0): 193, ProcessCoord(pipe=48, data=2, model=0): 194, ProcessCoord(pipe=48, data=3, model=0): 195, ProcessCoord(pipe=49, data=0, model=0): 196, ProcessCoord(pipe=49, data=1, model=0): 197, ProcessCoord(pipe=49, data=2, model=0): 198, ProcessCoord(pipe=49, data=3, model=0): 199, ProcessCoord(pipe=50, data=0, model=0): 200, ProcessCoord(pipe=50, data=1, model=0): 201, ProcessCoord(pipe=50, data=2, model=0): 202, ProcessCoord(pipe=50, data=3, model=0): 203, ProcessCoord(pipe=51, data=0, model=0): 204, ProcessCoord(pipe=51, data=1, model=0): 205, ProcessCoord(pipe=51, data=2, model=0): 206, ProcessCoord(pipe=51, data=3, model=0): 207, ProcessCoord(pipe=52, data=0, model=0): 208, ProcessCoord(pipe=52, data=1, model=0): 209, ProcessCoord(pipe=52, data=2, model=0): 210, ProcessCoord(pipe=52, data=3, model=0): 211, ProcessCoord(pipe=53, data=0, model=0): 212, ProcessCoord(pipe=53, data=1, model=0): 213, ProcessCoord(pipe=53, data=2, model=0): 214, ProcessCoord(pipe=53, data=3, model=0): 215, ProcessCoord(pipe=54, data=0, model=0): 216, ProcessCoord(pipe=54, data=1, model=0): 217, ProcessCoord(pipe=54, data=2, model=0): 218, ProcessCoord(pipe=54, data=3, model=0): 219, ProcessCoord(pipe=55, data=0, model=0): 220, ProcessCoord(pipe=55, data=1, model=0): 221, ProcessCoord(pipe=55, data=2, model=0): 222, ProcessCoord(pipe=55, data=3, model=0): 223, ProcessCoord(pipe=56, data=0, model=0): 224, ProcessCoord(pipe=56, data=1, model=0): 225, ProcessCoord(pipe=56, data=2, model=0): 226, ProcessCoord(pipe=56, data=3, model=0): 227, ProcessCoord(pipe=57, data=0, model=0): 228, ProcessCoord(pipe=57, data=1, model=0): 229, ProcessCoord(pipe=57, data=2, model=0): 230, ProcessCoord(pipe=57, data=3, model=0): 231, ProcessCoord(pipe=58, data=0, model=0): 232, ProcessCoord(pipe=58, data=1, model=0): 233, ProcessCoord(pipe=58, data=2, model=0): 234, ProcessCoord(pipe=58, data=3, model=0): 235, ProcessCoord(pipe=59, data=0, model=0): 236, ProcessCoord(pipe=59, data=1, model=0): 237, ProcessCoord(pipe=59, data=2, model=0): 238, ProcessCoord(pipe=59, data=3, model=0): 239, ProcessCoord(pipe=60, data=0, model=0): 240, ProcessCoord(pipe=60, data=1, model=0): 241, ProcessCoord(pipe=60, data=2, model=0): 242, ProcessCoord(pipe=60, data=3, model=0): 243, ProcessCoord(pipe=61, data=0, model=0): 244, ProcessCoord(pipe=61, data=1, model=0): 245, ProcessCoord(pipe=61, data=2, model=0): 246, ProcessCoord(pipe=61, data=3, model=0): 247, ProcessCoord(pipe=62, data=0, model=0): 248, ProcessCoord(pipe=62, data=1, model=0): 249, ProcessCoord(pipe=62, data=2, model=0): 250, ProcessCoord(pipe=62, data=3, model=0): 251, ProcessCoord(pipe=63, data=0, model=0): 252, ProcessCoord(pipe=63, data=1, model=0): 253, ProcessCoord(pipe=63, data=2, model=0): 254, ProcessCoord(pipe=63, data=3, model=0): 255, ProcessCoord(pipe=64, data=0, model=0): 256, ProcessCoord(pipe=64, data=1, model=0): 257, ProcessCoord(pipe=64, data=2, model=0): 258, ProcessCoord(pipe=64, data=3, model=0): 259, ProcessCoord(pipe=65, data=0, model=0): 260, ProcessCoord(pipe=65, data=1, model=0): 261, ProcessCoord(pipe=65, data=2, model=0): 262, ProcessCoord(pipe=65, data=3, model=0): 263, ProcessCoord(pipe=66, data=0, model=0): 264, ProcessCoord(pipe=66, data=1, model=0): 265, ProcessCoord(pipe=66, data=2, model=0): 266, ProcessCoord(pipe=66, data=3, model=0): 267, ProcessCoord(pipe=67, data=0, model=0): 268, ProcessCoord(pipe=67, data=1, model=0): 269, ProcessCoord(pipe=67, data=2, model=0): 270, ProcessCoord(pipe=67, data=3, model=0): 271, ProcessCoord(pipe=68, data=0, model=0): 272, ProcessCoord(pipe=68, data=1, model=0): 273, ProcessCoord(pipe=68, data=2, model=0): 274, ProcessCoord(pipe=68, data=3, model=0): 275, ProcessCoord(pipe=69, data=0, model=0): 276, ProcessCoord(pipe=69, data=1, model=0): 277, ProcessCoord(pipe=69, data=2, model=0): 278, ProcessCoord(pipe=69, data=3, model=0): 279, ProcessCoord(pipe=70, data=0, model=0): 280, ProcessCoord(pipe=70, data=1, model=0): 281, ProcessCoord(pipe=70, data=2, model=0): 282, ProcessCoord(pipe=70, data=3, model=0): 283, ProcessCoord(pipe=71, data=0, model=0): 284, ProcessCoord(pipe=71, data=1, model=0): 285, ProcessCoord(pipe=71, data=2, model=0): 286, ProcessCoord(pipe=71, data=3, model=0): 287} [default0]:[2022-09-03 19:40:34,099] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer|embedding [default0]:stage=0 layers=3 [default0]: 0: _to_float16 [default0]: 1: EmbeddingPipe [default0]: 2: [default0]:stage=1 layers=1 [default0]: 3: ParallelTransformerLayerPipe [default0]:stage=2 layers=1 [default0]: 4: ParallelTransformerLayerPipe [default0]:stage=3 layers=1 [default0]: 5: ParallelTransformerLayerPipe [default0]:stage=4 layers=1 [default0]: 6: ParallelTransformerLayerPipe [default0]:stage=5 layers=1 [default0]: 7: ParallelTransformerLayerPipe [default0]:stage=6 layers=1 [default0]: 8: ParallelTransformerLayerPipe [default0]:stage=7 layers=1 [default0]: 9: ParallelTransformerLayerPipe [default0]:stage=8 layers=1 [default0]: 10: ParallelTransformerLayerPipe [default0]:stage=9 layers=1 [default0]: 11: ParallelTransformerLayerPipe [default0]:stage=10 layers=1 [default0]: 12: ParallelTransformerLayerPipe [default0]:stage=11 layers=1 [default0]: 13: ParallelTransformerLayerPipe [default0]:stage=12 layers=1 [default0]: 14: ParallelTransformerLayerPipe [default0]:stage=13 layers=1 [default0]: 15: ParallelTransformerLayerPipe [default0]:stage=14 layers=1 [default0]: 16: ParallelTransformerLayerPipe [default0]:stage=15 layers=1 [default0]: 17: ParallelTransformerLayerPipe [default0]:stage=16 layers=1 [default0]: 18: ParallelTransformerLayerPipe [default0]:stage=17 layers=1 [default0]: 19: ParallelTransformerLayerPipe [default0]:stage=18 layers=1 [default0]: 20: ParallelTransformerLayerPipe [default0]:stage=19 layers=1 [default0]: 21: ParallelTransformerLayerPipe [default0]:stage=20 layers=1 [default0]: 22: ParallelTransformerLayerPipe [default0]:stage=21 layers=1 [default0]: 23: ParallelTransformerLayerPipe [default0]:stage=22 layers=1 [default0]: 24: ParallelTransformerLayerPipe [default0]:stage=23 layers=1 [default0]: 25: ParallelTransformerLayerPipe [default0]:stage=24 layers=1 [default0]: 26: ParallelTransformerLayerPipe [default0]:stage=25 layers=1 [default0]: 27: ParallelTransformerLayerPipe [default0]:stage=26 layers=1 [default0]: 28: ParallelTransformerLayerPipe [default0]:stage=27 layers=1 [default0]: 29: ParallelTransformerLayerPipe [default0]:stage=28 layers=1 [default0]: 30: ParallelTransformerLayerPipe [default0]:stage=29 layers=1 [default0]: 31: ParallelTransformerLayerPipe [default0]:stage=30 layers=1 [default0]: 32: ParallelTransformerLayerPipe [default0]:stage=31 layers=1 [default0]: 33: ParallelTransformerLayerPipe [default0]:stage=32 layers=1 [default0]: 34: ParallelTransformerLayerPipe [default0]:stage=33 layers=1 [default0]: 35: ParallelTransformerLayerPipe [default0]:stage=34 layers=1 [default0]: 36: ParallelTransformerLayerPipe [default0]:stage=35 layers=1 [default0]: 37: ParallelTransformerLayerPipe [default0]:stage=36 layers=1 [default0]: 38: ParallelTransformerLayerPipe [default0]:stage=37 layers=1 [default0]: 39: ParallelTransformerLayerPipe [default0]:stage=38 layers=1 [default0]: 40: ParallelTransformerLayerPipe [default0]:stage=39 layers=1 [default0]: 41: ParallelTransformerLayerPipe [default0]:stage=40 layers=1 [default0]: 42: ParallelTransformerLayerPipe [default0]:stage=41 layers=1 [default0]: 43: ParallelTransformerLayerPipe [default0]:stage=42 layers=1 [default0]: 44: ParallelTransformerLayerPipe [default0]:stage=43 layers=1 [default0]: 45: ParallelTransformerLayerPipe [default0]:stage=44 layers=1 [default0]: 46: ParallelTransformerLayerPipe [default0]:stage=45 layers=1 [default0]: 47: ParallelTransformerLayerPipe [default0]:stage=46 layers=1 [default0]: 48: ParallelTransformerLayerPipe [default0]:stage=47 layers=1 [default0]: 49: ParallelTransformerLayerPipe [default0]:stage=48 layers=1 [default0]: 50: ParallelTransformerLayerPipe [default0]:stage=49 layers=1 [default0]: 51: ParallelTransformerLayerPipe [default0]:stage=50 layers=1 [default0]: 52: ParallelTransformerLayerPipe [default0]:stage=51 layers=1 [default0]: 53: ParallelTransformerLayerPipe [default0]:stage=52 layers=1 [default0]: 54: ParallelTransformerLayerPipe [default0]:stage=53 layers=1 [default0]: 55: ParallelTransformerLayerPipe [default0]:stage=54 layers=1 [default0]: 56: ParallelTransformerLayerPipe [default0]:stage=55 layers=1 [default0]: 57: ParallelTransformerLayerPipe [default0]:stage=56 layers=1 [default0]: 58: ParallelTransformerLayerPipe [default0]:stage=57 layers=1 [default0]: 59: ParallelTransformerLayerPipe [default0]:stage=58 layers=1 [default0]: 60: ParallelTransformerLayerPipe [default0]:stage=59 layers=1 [default0]: 61: ParallelTransformerLayerPipe [default0]:stage=60 layers=1 [default0]: 62: ParallelTransformerLayerPipe [default0]:stage=61 layers=1 [default0]: 63: ParallelTransformerLayerPipe [default0]:stage=62 layers=1 [default0]: 64: ParallelTransformerLayerPipe [default0]:stage=63 layers=1 [default0]: 65: ParallelTransformerLayerPipe [default0]:stage=64 layers=1 [default0]: 66: ParallelTransformerLayerPipe [default0]:stage=65 layers=1 [default0]: 67: ParallelTransformerLayerPipe [default0]:stage=66 layers=1 [default0]: 68: ParallelTransformerLayerPipe [default0]:stage=67 layers=1 [default0]: 69: ParallelTransformerLayerPipe [default0]:stage=68 layers=1 [default0]: 70: ParallelTransformerLayerPipe [default0]:stage=69 layers=1 [default0]: 71: ParallelTransformerLayerPipe [default0]:stage=70 layers=3 [default0]: 72: ParallelTransformerLayerPipe [default0]: 73: undo [default0]: 74: MixedFusedLayerNorm [default0]:stage=71 layers=2 [default0]: 75: EmbeddingPipe [default0]: 76: float16_to_fp32 [default0]: loss: CrossEntropy [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default3]:Building extension module utils... [default3]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default1]:Loading extension module utils... [default3]:ninja: no work to do. [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3643937110900879 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3672788143157959 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Time to load utils op: 0.3677208423614502 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.36686134338378906 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default2]:Building extension module utils... [default2]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.05592012405395508 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Loading extension module utils... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.05657339096069336 seconds [default3]:Time to load utils op: 0.07184386253356934 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.056420326232910156 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:ninja: no work to do. [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.26013827323913574 seconds [default0]:[2022-09-03 19:40:35,830] [INFO] [utils.py:827:see_memory_usage] After Building Model [default0]:[2022-09-03 19:40:35,831] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 19:40:35,831] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.45 GB, percent = 7.2% [default0]:setting training iterations to 3100 [default0]:> learning rate decay style: constant [default0]:DeepSpeed is enabled. [default0]:[2022-09-03 19:40:35,832] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.1+8b2a6371, git-hash=8b2a6371, git-branch=master [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2458188533782959 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.23775601387023926 seconds [default1]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3429086208343506 seconds [default1]:Time to load utils op: 0.34291625022888184 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2807958126068115 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2807581424713135 seconds [default1]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.28076624870300293 seconds [default1]:Time to load utils op: 0.2807803153991699 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.24588418006896973 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.24588561058044434 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.24693942070007324 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.24191904067993164 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.23035430908203125 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2418975830078125 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.24194002151489258 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.24651002883911133 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2751593589782715 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.34171342849731445 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.34093236923217773 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.22107458114624023 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.24191832542419434 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.23779940605163574 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2210524082183838 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.24530816078186035 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.22106122970581055 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.22079157829284668 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.283923864364624 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.22080636024475098 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2208240032196045 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2646160125732422 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2208414077758789 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.27515506744384766 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2305283546447754 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2765309810638428 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2755625247955322 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2752506732940674 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2210547924041748 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2299346923828125 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.23003935813903809 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.28343796730041504 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.28307533264160156 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.283052921295166 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.22184276580810547 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.26434850692749023 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.26430416107177734 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3481478691101074 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3481309413909912 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.264101505279541 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2659637928009033 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3427250385284424 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.34269261360168457 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.22136402130126953 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2212660312652588 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30632853507995605 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3063318729400635 seconds [default2]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.25540924072265625 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.24228739738464355 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.24238824844360352 seconds [default2]:Time to load utils op: 0.30628204345703125 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.24223613739013672 seconds [default4]:Loading extension module utils... [default0]:Loading extension module utils... [default4]:Time to load utils op: 0.25539350509643555 seconds [default0]:Time to load utils op: 0.2430582046508789 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.25538015365600586 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2836921215057373 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2837679386138916 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2843821048736572 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2841212749481201 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3481283187866211 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3429391384124756 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.33312344551086426 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.26588869094848633 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3481259346008301 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.24175643920898438 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2669074535369873 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2906200885772705 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2348003387451172 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.23041367530822754 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.23694062232971191 seconds [default5]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.23694396018981934 seconds [default5]:Time to load utils op: 0.23778438568115234 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3428995609283447 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.23040461540222168 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.22100234031677246 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2303926944732666 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.23039865493774414 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.23571419715881348 seconds [default3]:Loading extension module utils... [default4]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.29062414169311523 seconds [default4]:Time to load utils op: 0.24247264862060547 seconds [default3]:Time to load utils op: 0.3063046932220459 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.26609039306640625 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2751657962799072 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2751646041870117 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.27516889572143555 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2561323642730713 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2557857036590576 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.25270938873291016 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2554917335510254 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2531321048736572 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.253680944442749 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.25277233123779297 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.22214007377624512 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.22214317321777344 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2698020935058594 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2697889804840088 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2995448112487793 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.29953956604003906 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2698171138763428 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2698366641998291 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3079354763031006 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.30792975425720215 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2555582523345947 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3331325054168701 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005633831024169922 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.23803472518920898 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.227766752243042 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2277507781982422 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.25539398193359375 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.22826623916625977 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.22777676582336426 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.24173951148986816 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.24299931526184082 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3032383918762207 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.303253173828125 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.23484015464782715 seconds [default0]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.24300837516784668 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3310055732727051 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.24173808097839355 seconds [default2]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.22155332565307617 seconds [default2]:Time to load utils op: 0.33100318908691406 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.23476529121398926 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3032259941101074 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.24176740646362305 seconds [default0]:Time to load utils op: 0.33101463317871094 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.22160887718200684 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.24301958084106445 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.242997407913208 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2662835121154785 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.22775530815124512 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.29062604904174805 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.22825384140014648 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3331282138824463 seconds [default2]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.26728081703186035 seconds [default2]:Time to load utils op: 0.27005982398986816 seconds [default0]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.27006077766418457 seconds [default0]:Time to load utils op: 0.270047664642334 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2660856246948242 seconds [default1]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.26612305641174316 seconds [default1]:Time to load utils op: 0.33313632011413574 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.22826838493347168 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.27434277534484863 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3079099655151367 seconds [default5]:Loading extension module utils... [default6]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.27463603019714355 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.27440595626831055 seconds [default5]:Time to load utils op: 0.24196553230285645 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3079347610473633 seconds [default6]:Time to load utils op: 0.2414531707763672 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3180248737335205 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.31708812713623047 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.31728076934814453 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2532660961151123 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.27474546432495117 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2524435520172119 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.23836779594421387 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.23691558837890625 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.237382173538208 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.22939777374267578 seconds [default2]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.27423858642578125 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.23205924034118652 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.34098267555236816 seconds [default2]:Time to load utils op: 0.3409733772277832 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.34099888801574707 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20745086669921875 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20680594444274902 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.27515339851379395 seconds [default0]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.22826886177062988 seconds [default0]:Time to load utils op: 0.3409872055053711 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20671582221984863 seconds [default6]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.22826766967773438 seconds [default6]:Time to load utils op: 0.2741999626159668 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2756822109222412 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.29952335357666016 seconds [default7]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.26294922828674316 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.27364325523376465 seconds [default7]:Time to load utils op: 0.26294803619384766 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3372175693511963 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.25493359565734863 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2413632869720459 seconds [default1]:Loading extension module utils... [default5]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2995576858520508 seconds [default2]:Loading extension module utils... [default5]:Time to load utils op: 0.25649571418762207 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.26383447647094727 seconds [default2]:Time to load utils op: 0.25217700004577637 seconds [default2]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2519097328186035 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.25647664070129395 seconds [default4]:Loading extension module utils... [default1]:Time to load utils op: 0.2333087921142578 seconds [default4]:Time to load utils op: 0.2575240135192871 seconds [default6]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2548096179962158 seconds [default6]:Time to load utils op: 0.2564566135406494 seconds [default2]:Time to load utils op: 0.23331212997436523 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.23420119285583496 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.23326730728149414 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.27229952812194824 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3081669807434082 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.337221622467041 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2728414535522461 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2722933292388916 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3372061252593994 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2635629177093506 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.22968363761901855 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2290198802947998 seconds [default3]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.23107266426086426 seconds [default3]:Time to load utils op: 0.22912240028381348 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2575242519378662 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3092684745788574 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3082284927368164 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30322837829589844 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3372042179107666 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.26264166831970215 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2558293342590332 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.255540132522583 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.29615068435668945 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2558774948120117 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.25551414489746094 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.23141264915466309 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2551145553588867 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2496175765991211 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.23695755004882812 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2512080669403076 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.24891972541809082 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.25582146644592285 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2495436668395996 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.27006030082702637 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.24738812446594238 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.25038623809814453 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2906215190887451 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.27039623260498047 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.29617977142333984 seconds [default3]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.23206019401550293 seconds [default3]:Time to load utils op: 0.23206257820129395 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.23105788230895996 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.29618310928344727 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3093075752258301 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2638387680053711 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.26224231719970703 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2705271244049072 seconds [default7]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.33100199699401855 seconds [default7]:Time to load utils op: 0.2703535556793213 seconds [default7]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.23103642463684082 seconds [default7]:Time to load utils op: 0.29616594314575195 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20659947395324707 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.23106956481933594 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21013760566711426 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21012568473815918 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.23205113410949707 seconds [default2]:Loading extension module utils... [default5]:Loading extension module utils... [default7]:Loading extension module utils... [default5]:Time to load utils op: 0.21012592315673828 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.23204779624938965 seconds [default7]:Time to load utils op: 0.21010899543762207 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.23205232620239258 seconds [default2]:Time to load utils op: 0.23204708099365234 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.23204612731933594 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2847867012023926 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2847778797149658 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.262251615524292 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.28478026390075684 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.27121567726135254 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.22828245162963867 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.23050642013549805 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2564737796783447 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.22827363014221191 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.22825407981872559 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2303004264831543 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2847869396209717 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.23029756546020508 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.7481048107147217 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.7480509281158447 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.7479038238525391 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.7474989891052246 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00042557716369628906 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005195140838623047 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0003528594970703125 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default3]:Time to load utils op: 0.0003361701965332031 seconds [default1]:Time to load utils op: 0.00038313865661621094 seconds [default7]:Time to load utils op: 0.0004429817199707031 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004985332489013672 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006704330444335938 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0016765594482421875 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0016400814056396484 seconds [default5]:Time to load utils op: 0.0017240047454833984 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0016269683837890625 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004715919494628906 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.001093149185180664 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Time to load utils op: 0.0004928112030029297 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005164146423339844 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0010721683502197266 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0010347366333007812 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009255409240722656 seconds [default1]:Time to load utils op: 0.0010237693786621094 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008809566497802734 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007765293121337891 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007557868957519531 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005404949188232422 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006353855133056641 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007908344268798828 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005650520324707031 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008566379547119141 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005939006805419922 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006649494171142578 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0013279914855957031 seconds [default0]:Time to load utils op: 0.0012469291687011719 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007746219635009766 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004775524139404297 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0009052753448486328 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005481243133544922 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005612373352050781 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005109310150146484 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005483627319335938 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005385875701904297 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005965232849121094 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006778240203857422 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004622936248779297 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005624294281005859 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005743503570556641 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005669593811035156 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007014274597167969 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005788803100585938 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00048351287841796875 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006155967712402344 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005178451538085938 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00048542022705078125 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005140304565429688 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00045752525329589844 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008723735809326172 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004506111145019531 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006616115570068359 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000537872314453125 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005176067352294922 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00046896934509277344 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006551742553710938 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005826950073242188 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006918907165527344 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007700920104980469 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0011272430419921875 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0012137889862060547 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005998611450195312 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0010063648223876953 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006022453308105469 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006887912750244141 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006873607635498047 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0011208057403564453 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007193088531494141 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005850791931152344 seconds [default4]:Time to load utils op: 0.0005617141723632812 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005209445953369141 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007262229919433594 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005838871002197266 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006184577941894531 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Time to load utils op: 0.0005755424499511719 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005447864532470703 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005745887756347656 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004947185516357422 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005035400390625 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007038116455078125 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006699562072753906 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004496574401855469 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006210803985595703 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008301734924316406 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005922317504882812 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005257129669189453 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005331039428710938 seconds [default6]:Time to load utils op: 0.0006465911865234375 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.001140594482421875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008027553558349609 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0009598731994628906 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0009915828704833984 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006518363952636719 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005235671997070312 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005116462707519531 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0011980533599853516 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0014071464538574219 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00037980079650878906 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004904270172119141 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005021095275878906 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006339550018310547 seconds [default0]:Time to load utils op: 0.0006630420684814453 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006415843963623047 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0011472702026367188 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007681846618652344 seconds [default1]:Time to load utils op: 0.0006532669067382812 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Time to load utils op: 0.0007991790771484375 seconds [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007996559143066406 seconds [default0]:Time to load utils op: 0.0005831718444824219 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005748271942138672 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006225109100341797 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0009069442749023438 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0009453296661376953 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0009243488311767578 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007150173187255859 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006434917449951172 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006127357482910156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008378028869628906 seconds [default3]:Time to load utils op: 0.0004737377166748047 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005106925964355469 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006203651428222656 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006740093231201172 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005352497100830078 seconds [default5]:Time to load utils op: 0.0005972385406494141 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006666183471679688 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006532669067382812 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006337165832519531 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006575584411621094 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006742477416992188 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005617141723632812 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default2]:Time to load utils op: 0.0008282661437988281 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Time to load utils op: 0.0008225440979003906 seconds [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008749961853027344 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007700920104980469 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005075931549072266 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005438327789306641 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005517005920410156 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006940364837646484 seconds [default2]:Time to load utils op: 0.0006291866302490234 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004456043243408203 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005893707275390625 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006735324859619141 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007319450378417969 seconds [default6]:Time to load utils op: 0.0007297992706298828 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007500648498535156 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006597042083740234 seconds [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005903244018554688 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006256103515625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Time to load utils op: 0.0006697177886962891 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006139278411865234 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:Time to load utils op: 0.0007483959197998047 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007176399230957031 seconds [default0]:Time to load utils op: 0.0006353855133056641 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005805492401123047 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005507469177246094 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00041604042053222656 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.000576019287109375 seconds [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0012154579162597656 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0009131431579589844 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004887580871582031 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004911422729492188 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0011012554168701172 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004899501800537109 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0012373924255371094 seconds [default3]:Time to load utils op: 0.0004887580871582031 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004134178161621094 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00045943260192871094 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0012845993041992188 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006806850433349609 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.001093149185180664 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005419254302978516 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006985664367675781 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006206035614013672 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Time to load utils op: 0.0005974769592285156 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006308555603027344 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005075931549072266 seconds [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006625652313232422 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005939006805419922 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006890296936035156 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006399154663085938 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007343292236328125 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0009050369262695312 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.001079559326171875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008158683776855469 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007996559143066406 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0010373592376708984 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0010023117065429688 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007872581481933594 seconds [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.000530242919921875 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006742477416992188 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005617141723632812 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0010018348693847656 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007696151733398438 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007169246673583984 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006039142608642578 seconds [default7]:Time to load utils op: 0.0006473064422607422 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005869865417480469 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006000995635986328 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006771087646484375 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006554126739501953 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0009427070617675781 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007035732269287109 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006072521209716797 seconds [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007569789886474609 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007009506225585938 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007615089416503906 seconds [default0]:Time to load utils op: 0.0005178451538085938 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0008652210235595703 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004885196685791016 seconds [default5]:Time to load utils op: 0.0005924701690673828 seconds [default6]:Time to load utils op: 0.0006213188171386719 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007021427154541016 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008995532989501953 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0011734962463378906 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007352828979492188 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006368160247802734 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006933212280273438 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004999637603759766 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007810592651367188 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005574226379394531 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006155967712402344 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005402565002441406 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.000560760498046875 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005304813385009766 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008285045623779297 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005228519439697266 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00081634521484375 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007228851318359375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004987716674804688 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.000640869140625 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006833076477050781 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006728172302246094 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004398822784423828 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007531642913818359 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005702972412109375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005247592926025391 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005786418914794922 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00047850608825683594 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006151199340820312 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Time to load utils op: 0.0006475448608398438 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005671977996826172 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Time to load utils op: 0.0005631446838378906 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007388591766357422 seconds [default4]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007114410400390625 seconds [default4]:Time to load utils op: 0.0005476474761962891 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005605220794677734 seconds [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005986690521240234 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006091594696044922 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005247592926025391 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0009789466857910156 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007483959197998047 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007975101470947266 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default7]:Time to load utils op: 0.0009479522705078125 seconds [default5]:Time to load utils op: 0.0009341239929199219 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005245208740234375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007882118225097656 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Time to load utils op: 0.0006496906280517578 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005450248718261719 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00035071372985839844 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005583763122558594 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006124973297119141 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005338191986083984 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005617141723632812 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008156299591064453 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005385875701904297 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005762577056884766 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005741119384765625 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008871555328369141 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004849433898925781 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005676746368408203 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-03 19:40:36,610] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [default0]:[2022-09-03 19:40:36,611] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer [default0]:[2022-09-03 19:40:36,611] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer [default0]:[2022-09-03 19:40:36,611] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = {basic_optimizer.__class__.__name__} [default0]:[2022-09-03 19:40:36,611] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-03 19:40:36,653] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer [default0]:[2022-09-03 19:40:36,654] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 19:40:36,654] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.61 GB, percent = 7.3% [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default4]:Building extension module utils... [default4]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20667386054992676 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30288147926330566 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.30339479446411133 seconds [default4]:ninja: no work to do. [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.24387240409851074 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005030632019042969 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30367016792297363 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20658564567565918 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20652103424072266 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20226740837097168 seconds [default0]:[2022-09-03 19:40:36,885] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0016207695007324219 seconds [default0]:[2022-09-03 19:40:36,885] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 19:40:36,885] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.61 GB, percent = 7.3% [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00041222572326660156 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0003845691680908203 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003402233123779297 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0015916824340820312 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0015671253204345703 seconds [default0]:[2022-09-03 19:40:36,949] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 [default0]:[2022-09-03 19:40:36,950] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 19:40:36,950] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.61 GB, percent = 7.3% [default0]:[2022-09-03 19:40:36,975] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 [default0]:[2022-09-03 19:40:36,976] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 19:40:36,976] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.62 GB, percent = 7.3% [default0]:[2022-09-03 19:40:37,003] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 [default0]:[2022-09-03 19:40:37,004] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 19:40:37,004] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.62 GB, percent = 7.3% [default0]:[2022-09-03 19:40:37,030] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer [default0]:[2022-09-03 19:40:37,030] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 19:40:37,031] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.62 GB, percent = 7.3% [default0]:[2022-09-03 19:40:37,114] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer [default0]:[2022-09-03 19:40:37,115] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-03 19:40:37,115] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.62 GB, percent = 7.3% [default0]:[2022-09-03 19:40:37,140] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer [default0]:[2022-09-03 19:40:37,141] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-03 19:40:37,141] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.62 GB, percent = 7.3% [default0]:[2022-09-03 19:40:37,141] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [default0]:[2022-09-03 19:40:37,141] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler [default0]:[2022-09-03 19:40:37,141] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [default0]:[2022-09-03 19:40:37,141] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05, 2e-05], mom=[(0.9, 0.95), (0.9, 0.95)] [default0]:[2022-09-03 19:40:37,141] [INFO] [config.py:987:print] DeepSpeedEngine configuration: [default0]:[2022-09-03 19:40:37,141] [INFO] [config.py:991:print] activation_checkpointing_config { [default0]: "partition_activations": false, [default0]: "contiguous_memory_optimization": false, [default0]: "cpu_checkpointing": false, [default0]: "number_checkpoints": null, [default0]: "synchronize_checkpoint_boundary": false, [default0]: "profile": false [default0]:} [default0]:[2022-09-03 19:40:37,141] [INFO] [config.py:991:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [default0]:[2022-09-03 19:40:37,141] [INFO] [config.py:991:print] amp_enabled .................. False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] amp_params ................... False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] autotuning_config ............ { [default0]: "enabled": false, [default0]: "start_step": null, [default0]: "end_step": null, [default0]: "metric_path": null, [default0]: "arg_mappings": null, [default0]: "metric": "throughput", [default0]: "model_info": null, [default0]: "results_dir": null, [default0]: "exps_dir": null, [default0]: "overwrite": true, [default0]: "fast": true, [default0]: "start_profile_step": 3, [default0]: "end_profile_step": 5, [default0]: "tuner_type": "gridsearch", [default0]: "tuner_early_stopping": 5, [default0]: "tuner_num_trials": 50, [default0]: "model_info_path": null, [default0]: "mp_size": 1, [default0]: "max_train_batch_size": null, [default0]: "min_train_batch_size": 1, [default0]: "max_train_micro_batch_size_per_gpu": 1.024000e+03, [default0]: "min_train_micro_batch_size_per_gpu": 1, [default0]: "num_tuning_micro_batch_sizes": 3 [default0]:} [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] bfloat16_enabled ............. True [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] checkpoint_tag_validation_enabled True [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] checkpoint_tag_validation_fail False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] comms_config ................. [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] communication_data_type ...... None [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] curriculum_enabled ........... False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] curriculum_params ............ False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] dataloader_drop_last ......... False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] disable_allgather ............ False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] dump_state ................... False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] dynamic_loss_scale_args ...... None [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] eigenvalue_enabled ........... False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] eigenvalue_gas_boundary_resolution 1 [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] eigenvalue_layer_name ........ bert.encoder.layer [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] eigenvalue_layer_num ......... 0 [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] eigenvalue_max_iter .......... 100 [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] eigenvalue_stability ......... 1e-06 [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] eigenvalue_tol ............... 0.01 [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] eigenvalue_verbose ........... False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] elasticity_enabled ........... False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] flops_profiler_config ........ { [default0]: "enabled": false, [default0]: "profile_step": 1, [default0]: "module_depth": -1, [default0]: "top_modules": 1, [default0]: "detailed": true, [default0]: "output_file": null [default0]:} [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] fp16_auto_cast ............... None [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] fp16_enabled ................. False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] fp16_master_weights_and_gradients False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] global_rank .................. 0 [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] gradient_accumulation_steps .. 512 [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] gradient_clipping ............ 1.0 [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] gradient_predivide_factor .... 1.0 [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] initial_dynamic_scale ........ 1 [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] load_universal_checkpoint .... True [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] loss_scale ................... 1.0 [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] memory_breakdown ............. False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] monitor_config ............... [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] nebula_config ................ { [default0]: "enabled": false, [default0]: "persistent_storage_path": null, [default0]: "persistent_time_interval": 100, [default0]: "num_of_version_in_retention": 2, [default0]: "enable_nebula_load": true, [default0]: "load_path": null [default0]:} [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] optimizer_legacy_fusion ...... False [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] optimizer_name ............... None [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] optimizer_params ............. None [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [default0]:[2022-09-03 19:40:37,142] [INFO] [config.py:991:print] pld_enabled .................. False [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] pld_params ................... False [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] prescale_gradients ........... False [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] scheduler_name ............... None [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] scheduler_params ............. None [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] sparse_attention ............. None [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] sparse_gradients_enabled ..... False [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] steps_per_print .............. 2000 [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] train_batch_size ............. 2048 [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] train_micro_batch_size_per_gpu 1 [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] wall_clock_breakdown ......... False [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] world_size ................... 4 [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] zero_allow_untested_optimizer False [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] zero_enabled ................. False [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:991:print] zero_optimization_stage ...... 0 [default0]:[2022-09-03 19:40:37,143] [INFO] [config.py:976:print_user_config] json = { [default0]: "train_micro_batch_size_per_gpu": 1, [default0]: "train_batch_size": 2.048000e+03, [default0]: "gradient_clipping": 1.0, [default0]: "zero_optimization": { [default0]: "stage": 0 [default0]: }, [default0]: "bf16": { [default0]: "enabled": true [default0]: }, [default0]: "steps_per_print": 2.000000e+03, [default0]: "wall_clock_breakdown": false, [default0]: "checkpoint": { [default0]: "load_universal": true [default0]: } [default0]:} [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005002021789550781 seconds [default0]:[2022-09-03 19:40:37,143] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=512 micro_batch_size=1 [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=128 STAGE=32 LAYERS=1 [34, 35) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=208 STAGE=52 LAYERS=1 [54, 55) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=88 STAGE=22 LAYERS=1 [24, 25) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=32 STAGE=8 LAYERS=1 [10, 11) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=36 STAGE=9 LAYERS=1 [11, 12) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=212 STAGE=53 LAYERS=1 [55, 56) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=136 STAGE=34 LAYERS=1 [36, 37) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=80 STAGE=20 LAYERS=1 [22, 23) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=68 STAGE=17 LAYERS=1 [19, 20) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=264 STAGE=66 LAYERS=1 [68, 69) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=132 STAGE=33 LAYERS=1 [35, 36) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=152 STAGE=38 LAYERS=1 [40, 41) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=64 STAGE=16 LAYERS=1 [18, 19) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=92 STAGE=23 LAYERS=1 [25, 26) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=200 STAGE=50 LAYERS=1 [52, 53) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=176 STAGE=44 LAYERS=1 [46, 47) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=104 STAGE=26 LAYERS=1 [28, 29) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=272 STAGE=68 LAYERS=1 [70, 71) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=180 STAGE=45 LAYERS=1 [47, 48) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=84 STAGE=21 LAYERS=1 [23, 24) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=20 STAGE=5 LAYERS=1 [7, 8) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=108 STAGE=27 LAYERS=1 [29, 30) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=156 STAGE=39 LAYERS=1 [41, 42) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=168 STAGE=42 LAYERS=1 [44, 45) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=144 STAGE=36 LAYERS=1 [38, 39) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=140 STAGE=35 LAYERS=1 [37, 38) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=160 STAGE=40 LAYERS=1 [42, 43) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=3 [0, 3) STAGE_PARAMS=3596644352 (3596.644M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=204 STAGE=51 LAYERS=1 [53, 54) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=56 STAGE=14 LAYERS=1 [16, 17) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=184 STAGE=46 LAYERS=1 [48, 49) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=60 STAGE=15 LAYERS=1 [17, 18) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=188 STAGE=47 LAYERS=1 [49, 50) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=8 STAGE=2 LAYERS=1 [4, 5) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=172 STAGE=43 LAYERS=1 [45, 46) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=12 STAGE=3 LAYERS=1 [5, 6) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=284 STAGE=71 LAYERS=2 [75, 77) STAGE_PARAMS=3596615680 (3596.616M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=256 STAGE=64 LAYERS=1 [66, 67) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=96 STAGE=24 LAYERS=1 [26, 27) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=100 STAGE=25 LAYERS=1 [27, 28) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=276 STAGE=69 LAYERS=1 [71, 72) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=192 STAGE=48 LAYERS=1 [50, 51) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=240 STAGE=60 LAYERS=1 [62, 63) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=112 STAGE=28 LAYERS=1 [30, 31) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=196 STAGE=49 LAYERS=1 [51, 52) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=52 STAGE=13 LAYERS=1 [15, 16) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=48 STAGE=12 LAYERS=1 [14, 15) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=164 STAGE=41 LAYERS=1 [43, 44) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=268 STAGE=67 LAYERS=1 [69, 70) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=232 STAGE=58 LAYERS=1 [60, 61) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=244 STAGE=61 LAYERS=1 [63, 64) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=116 STAGE=29 LAYERS=1 [31, 32) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=260 STAGE=65 LAYERS=1 [67, 68) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=280 STAGE=70 LAYERS=3 [72, 75) STAGE_PARAMS=2466465792 (2466.466M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=220 STAGE=55 LAYERS=1 [57, 58) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=216 STAGE=54 LAYERS=1 [56, 57) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=236 STAGE=59 LAYERS=1 [61, 62) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=148 STAGE=37 LAYERS=1 [39, 40) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=16 STAGE=4 LAYERS=1 [6, 7) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=4 STAGE=1 LAYERS=1 [3, 4) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,751] [INFO] [engine.py:145:__init__] RANK=248 STAGE=62 LAYERS=1 [64, 65) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=24 STAGE=6 LAYERS=1 [8, 9) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=120 STAGE=30 LAYERS=1 [32, 33) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=124 STAGE=31 LAYERS=1 [33, 34) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=40 STAGE=10 LAYERS=1 [12, 13) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=224 STAGE=56 LAYERS=1 [58, 59) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=72 STAGE=18 LAYERS=1 [20, 21) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=28 STAGE=7 LAYERS=1 [9, 10) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=76 STAGE=19 LAYERS=1 [21, 22) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=252 STAGE=63 LAYERS=1 [65, 66) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=228 STAGE=57 LAYERS=1 [59, 60) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:40:37,744] [INFO] [engine.py:145:__init__] RANK=44 STAGE=11 LAYERS=1 [13, 14) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default1]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:40:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:40:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:40:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:40:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:40:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:40:47,461] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 251 [default3]:[2022-09-03 19:40:48,100] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 163 [default2]:[2022-09-03 19:40:48,201] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 162 [default2]:[2022-09-03 19:40:48,608] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 250 [default7]:[2022-09-03 19:40:48,663] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 223 [default0]:[2022-09-03 19:40:49,085] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 248 [default1]:[2022-09-03 19:40:49,081] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 249 [default7]:[2022-09-03 19:40:49,318] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 167 [default2]:[2022-09-03 19:40:50,218] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 282 [default4]:[2022-09-03 19:40:50,228] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 164 [default6]:[2022-09-03 19:40:50,478] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 254 [default7]:[2022-09-03 19:40:50,484] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 255 [default3]:[2022-09-03 19:40:50,611] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 155 [default3]:[2022-09-03 19:40:50,615] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 283 [default4]:[2022-09-03 19:40:50,880] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 252 [default5]:[2022-09-03 19:40:50,871] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 253 [default1]:[2022-09-03 19:40:51,015] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 217 [default0]:[2022-09-03 19:40:50,984] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 216 [default2]:[2022-09-03 19:40:51,084] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 242 [default0]:[2022-09-03 19:40:51,076] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 160 [default1]:[2022-09-03 19:40:51,074] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 161 [default3]:[2022-09-03 19:40:51,080] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 243 [default3]:[2022-09-03 19:40:51,105] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 219 [default2]:[2022-09-03 19:40:51,082] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 218 [default4]:[2022-09-03 19:40:51,152] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 220 [default7]:[2022-09-03 19:40:51,217] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 79 [default6]:[2022-09-03 19:40:51,210] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 78 [default7]:[2022-09-03 19:40:51,269] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 263 [default6]:[2022-09-03 19:40:51,211] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 222 [default3]:[2022-09-03 19:40:51,466] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 83 [default7]:[2022-09-03 19:40:51,528] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 231 [default6]:[2022-09-03 19:40:51,655] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 174 [default7]:[2022-09-03 19:40:51,667] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 175 [default0]:[2022-09-03 19:40:51,707] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 240 [default1]:[2022-09-03 19:40:51,708] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 241 [default3]:[2022-09-03 19:40:51,697] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 51 [default2]:[2022-09-03 19:40:51,703] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 50 [default7]:[2022-09-03 19:40:51,704] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 271 [default7]:[2022-09-03 19:40:51,895] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 247 [default3]:[2022-09-03 19:40:51,898] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 235 [default2]:[2022-09-03 19:40:51,907] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 234 [default3]:[2022-09-03 19:40:52,020] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 19 [default3]:[2022-09-03 19:40:52,053] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 259 [default2]:[2022-09-03 19:40:52,013] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 18 [default3]:[2022-09-03 19:40:52,158] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 227 [default7]:[2022-09-03 19:40:52,090] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 143 [default3]:[2022-09-03 19:40:52,171] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 171 [default7]:[2022-09-03 19:40:52,190] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 31 [default6]:[2022-09-03 19:40:52,183] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 30 [default2]:[2022-09-03 19:40:52,308] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 26 [default3]:[2022-09-03 19:40:52,341] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 11 [default5]:[2022-09-03 19:40:52,326] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 221 [default6]:[2022-09-03 19:40:52,388] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 86 [default7]:[2022-09-03 19:40:52,388] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 87 [default6]:[2022-09-03 19:40:52,439] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 246 [default3]:[2022-09-03 19:40:52,468] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 43 [default5]:[2022-09-03 19:40:52,539] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 85 [default4]:[2022-09-03 19:40:52,531] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 84 [default6]:[2022-09-03 19:40:52,759] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 46 [default7]:[2022-09-03 19:40:52,762] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 47 [default3]:[2022-09-03 19:40:52,718] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 99 [default5]:[2022-09-03 19:40:52,710] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 245 [default4]:[2022-09-03 19:40:52,759] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 260 [default4]:[2022-09-03 19:40:52,714] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 244 [default5]:[2022-09-03 19:40:52,765] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 261 [default0]:[2022-09-03 19:40:52,762] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 232 [default1]:[2022-09-03 19:40:52,763] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 233 [default3]:[2022-09-03 19:40:52,798] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 27 [default2]:[2022-09-03 19:40:52,800] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 82 [default3]:[2022-09-03 19:40:52,830] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 179 [default6]:[2022-09-03 19:40:52,838] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 166 [default5]:[2022-09-03 19:40:52,937] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 173 [default3]:[2022-09-03 19:40:53,024] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 139 [default2]:[2022-09-03 19:40:53,028] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 138 [default6]:[2022-09-03 19:40:53,003] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 142 [default4]:[2022-09-03 19:40:52,982] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 172 [default3]:[2022-09-03 19:40:52,986] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 275 [default7]:[2022-09-03 19:40:53,054] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 55 [default6]:[2022-09-03 19:40:53,154] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 14 [default6]:[2022-09-03 19:40:53,122] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 102 [default7]:[2022-09-03 19:40:53,119] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 103 [default5]:[2022-09-03 19:40:53,119] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 165 [default6]:[2022-09-03 19:40:53,107] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 54 [default6]:[2022-09-03 19:40:53,084] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 262 [default6]:[2022-09-03 19:40:53,104] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 270 [default5]:[2022-09-03 19:40:53,158] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 237 [default4]:[2022-09-03 19:40:53,154] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 236 [default7]:[2022-09-03 19:40:53,136] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 15 [default7]:[2022-09-03 19:40:53,147] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 239 [default6]:[2022-09-03 19:40:53,134] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 238 [default1]:[2022-09-03 19:40:53,264] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 73 [default2]:[2022-09-03 19:40:53,201] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 258 [default5]:[2022-09-03 19:40:53,230] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 181 [default4]:[2022-09-03 19:40:53,218] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 180 [default5]:[2022-09-03 19:40:53,231] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 149 [default4]:[2022-09-03 19:40:53,189] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 148 [default0]:[2022-09-03 19:40:53,281] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 72 [default5]:[2022-09-03 19:40:53,270] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 77 [default5]:[2022-09-03 19:40:53,346] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 29 [default4]:[2022-09-03 19:40:53,265] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 76 [default7]:[2022-09-03 19:40:53,353] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 63 [default0]:[2022-09-03 19:40:53,363] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 280 [default3]:[2022-09-03 19:40:53,435] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 75 [default7]:[2022-09-03 19:40:53,422] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 39 [default2]:[2022-09-03 19:40:53,401] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 74 [default6]:[2022-09-03 19:40:53,398] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 62 [default6]:[2022-09-03 19:40:53,449] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 182 [default7]:[2022-09-03 19:40:53,445] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 183 [default1]:[2022-09-03 19:40:53,376] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 281 [default7]:[2022-09-03 19:40:53,547] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 159 [default0]:[2022-09-03 19:40:53,499] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 168 [default1]:[2022-09-03 19:40:53,509] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 169 [default2]:[2022-09-03 19:40:53,509] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 146 [default3]:[2022-09-03 19:40:53,505] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 147 [default7]:[2022-09-03 19:40:53,673] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 151 [default6]:[2022-09-03 19:40:53,658] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 150 [default6]:[2022-09-03 19:40:53,669] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 6 [default7]:[2022-09-03 19:40:53,667] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 7 [default6]:[2022-09-03 19:40:53,693] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 230 [default4]:[2022-09-03 19:40:53,753] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 132 [default1]:[2022-09-03 19:40:53,757] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 137 [default2]:[2022-09-03 19:40:53,735] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 170 [default1]:[2022-09-03 19:40:53,711] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 257 [default0]:[2022-09-03 19:40:53,729] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 256 [default7]:[2022-09-03 19:40:53,732] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 95 [default3]:[2022-09-03 19:40:53,745] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 195 [default3]:[2022-09-03 19:40:53,730] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 115 [default0]:[2022-09-03 19:40:53,734] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 16 [default0]:[2022-09-03 19:40:53,805] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 136 [default0]:[2022-09-03 19:40:53,803] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 80 [default5]:[2022-09-03 19:40:53,771] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 141 [default3]:[2022-09-03 19:40:53,842] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 91 [default1]:[2022-09-03 19:40:53,805] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 81 [default4]:[2022-09-03 19:40:53,791] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 140 [default1]:[2022-09-03 19:40:53,772] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 145 [default0]:[2022-09-03 19:40:53,774] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 144 [default7]:[2022-09-03 19:40:53,835] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 279 [default3]:[2022-09-03 19:40:53,870] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 187 [default2]:[2022-09-03 19:40:53,796] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 98 [default4]:[2022-09-03 19:40:53,799] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 276 [default6]:[2022-09-03 19:40:53,827] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 278 [default7]:[2022-09-03 19:40:53,861] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 23 [default1]:[2022-09-03 19:40:53,832] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 17 [default2]:[2022-09-03 19:40:53,825] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 114 [default5]:[2022-09-03 19:40:53,811] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 277 [default7]:[2022-09-03 19:40:53,844] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 119 [default1]:[2022-09-03 19:40:53,900] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 25 [default0]:[2022-09-03 19:40:53,895] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 24 [default0]:[2022-09-03 19:40:53,881] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 224 [default3]:[2022-09-03 19:40:53,946] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 211 [default1]:[2022-09-03 19:40:53,937] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 225 [default3]:[2022-09-03 19:40:53,949] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 35 [default3]:[2022-09-03 19:40:53,959] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 203 [default5]:[2022-09-03 19:40:53,918] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 101 [default4]:[2022-09-03 19:40:53,918] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 100 [default6]:[2022-09-03 19:40:53,934] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 118 [default5]:[2022-09-03 19:40:53,944] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 5 [default2]:[2022-09-03 19:40:54,020] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 130 [default2]:[2022-09-03 19:40:54,012] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 202 [default6]:[2022-09-03 19:40:54,055] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 206 [default7]:[2022-09-03 19:40:54,049] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 207 [default1]:[2022-09-03 19:40:54,023] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 113 [default7]:[2022-09-03 19:40:54,021] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 199 [default0]:[2022-09-03 19:40:54,023] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 112 [default4]:[2022-09-03 19:40:54,056] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 268 [default5]:[2022-09-03 19:40:54,063] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 269 [default4]:[2022-09-03 19:40:53,995] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 4 [default4]:[2022-09-03 19:40:54,130] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 28 [default2]:[2022-09-03 19:40:54,076] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 58 [default4]:[2022-09-03 19:40:54,118] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 52 [default1]:[2022-09-03 19:40:54,161] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 49 [default5]:[2022-09-03 19:40:54,107] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 53 [default2]:[2022-09-03 19:40:54,202] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 226 [default6]:[2022-09-03 19:40:54,223] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 22 [default0]:[2022-09-03 19:40:54,215] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 8 [default1]:[2022-09-03 19:40:54,217] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 9 [default4]:[2022-09-03 19:40:54,187] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 20 [default2]:[2022-09-03 19:40:54,262] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 178 [default1]:[2022-09-03 19:40:54,236] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 97 [default3]:[2022-09-03 19:40:54,230] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 267 [default0]:[2022-09-03 19:40:54,235] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 96 [default5]:[2022-09-03 19:40:54,195] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 21 [default2]:[2022-09-03 19:40:54,316] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 42 [default4]:[2022-09-03 19:40:54,295] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 36 [default5]:[2022-09-03 19:40:54,313] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 37 [default2]:[2022-09-03 19:40:54,349] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 194 [default3]:[2022-09-03 19:40:54,397] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 67 [default5]:[2022-09-03 19:40:54,413] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 45 [default4]:[2022-09-03 19:40:54,415] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 44 [default5]:[2022-09-03 19:40:54,403] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 229 [default6]:[2022-09-03 19:40:54,391] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 214 [default4]:[2022-09-03 19:40:54,398] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 228 [default0]:[2022-09-03 19:40:54,406] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 152 [default7]:[2022-09-03 19:40:54,449] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 111 [default6]:[2022-09-03 19:40:54,452] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 110 [default2]:[2022-09-03 19:40:54,390] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 66 [default1]:[2022-09-03 19:40:54,458] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 153 [default2]:[2022-09-03 19:40:54,377] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 10 [default4]:[2022-09-03 19:40:54,453] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 12 [default5]:[2022-09-03 19:40:54,449] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 13 [default5]:[2022-09-03 19:40:54,462] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 117 [default4]:[2022-09-03 19:40:54,460] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 116 [default2]:[2022-09-03 19:40:54,556] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 34 [default3]:[2022-09-03 19:40:54,547] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 131 [default4]:[2022-09-03 19:40:54,482] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 156 [default5]:[2022-09-03 19:40:54,481] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 157 [default6]:[2022-09-03 19:40:54,490] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 158 [default0]:[2022-09-03 19:40:54,560] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 176 [default5]:[2022-09-03 19:40:54,574] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 133 [default6]:[2022-09-03 19:40:54,583] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 134 [default7]:[2022-09-03 19:40:54,577] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 135 [default2]:[2022-09-03 19:40:54,649] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 266 [default6]:[2022-09-03 19:40:54,761] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 70 [default1]:[2022-09-03 19:40:54,685] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 265 [default1]:[2022-09-03 19:40:54,688] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 105 [default0]:[2022-09-03 19:40:54,676] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 104 [default0]:[2022-09-03 19:40:54,683] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 264 [default7]:[2022-09-03 19:40:54,762] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 71 [default3]:[2022-09-03 19:40:54,783] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 123 [default1]:[2022-09-03 19:40:54,763] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 41 [default0]:[2022-09-03 19:40:54,762] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 40 [default1]:[2022-09-03 19:40:54,808] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 33 [default0]:[2022-09-03 19:40:54,847] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 200 [default0]:[2022-09-03 19:40:54,810] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 32 [default2]:[2022-09-03 19:40:54,777] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 154 [default2]:[2022-09-03 19:40:54,808] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 274 [default3]:[2022-09-03 19:40:54,777] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 107 [default0]:[2022-09-03 19:40:54,833] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 192 [default1]:[2022-09-03 19:40:54,809] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 193 [default1]:[2022-09-03 19:40:54,840] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 177 [default2]:[2022-09-03 19:40:54,782] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 106 [default7]:[2022-09-03 19:40:54,922] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 127 [default1]:[2022-09-03 19:40:54,911] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 129 [default7]:[2022-09-03 19:40:54,965] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 215 [default2]:[2022-09-03 19:40:54,947] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 90 [default6]:[2022-09-03 19:40:54,941] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 38 [default0]:[2022-09-03 19:40:54,912] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 272 [default1]:[2022-09-03 19:40:54,883] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 57 [default0]:[2022-09-03 19:40:54,873] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 56 [default1]:[2022-09-03 19:40:54,917] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 185 [default0]:[2022-09-03 19:40:54,946] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 184 [default3]:[2022-09-03 19:40:54,904] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 59 [default4]:[2022-09-03 19:40:54,912] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 188 [default1]:[2022-09-03 19:40:54,970] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 209 [default0]:[2022-09-03 19:40:54,983] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 208 [default4]:[2022-09-03 19:40:55,017] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 68 [default5]:[2022-09-03 19:40:55,021] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 69 [default7]:[2022-09-03 19:40:55,001] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 191 [default5]:[2022-09-03 19:40:54,996] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 61 [default6]:[2022-09-03 19:40:55,047] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 190 [default1]:[2022-09-03 19:40:55,007] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 273 [default4]:[2022-09-03 19:40:54,992] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 60 [default5]:[2022-09-03 19:40:55,122] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 125 [default5]:[2022-09-03 19:40:55,110] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 205 [default0]:[2022-09-03 19:40:55,088] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 64 [default4]:[2022-09-03 19:40:55,105] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 108 [default2]:[2022-09-03 19:40:55,099] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 186 [default0]:[2022-09-03 19:40:55,076] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 48 [default6]:[2022-09-03 19:40:55,188] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 126 [default4]:[2022-09-03 19:40:55,175] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 212 [default4]:[2022-09-03 19:40:55,199] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 196 [default5]:[2022-09-03 19:40:55,205] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 197 [default0]:[2022-09-03 19:40:55,315] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 120 [default1]:[2022-09-03 19:40:55,312] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 121 [default4]:[2022-09-03 19:40:55,268] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 124 [default0]:[2022-09-03 19:40:55,279] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 88 [default6]:[2022-09-03 19:40:55,336] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 198 [default0]:[2022-09-03 19:40:55,393] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 128 [default2]:[2022-09-03 19:40:55,448] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 210 [default6]:[2022-09-03 19:40:55,395] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 94 [default4]:[2022-09-03 19:40:55,510] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 92 [default5]:[2022-09-03 19:40:55,511] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 93 [default5]:[2022-09-03 19:40:55,570] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 213 [default5]:[2022-09-03 19:40:55,591] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 189 [default4]:[2022-09-03 19:40:55,719] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 204 [default2]:[2022-09-03 19:40:55,768] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 122 [default1]:[2022-09-03 19:40:55,809] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 201 [default5]:[2022-09-03 19:40:55,851] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 109 [default1]:[2022-09-03 19:40:55,864] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 65 [default1]:[2022-09-03 19:40:56,075] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 89 [default4]:[2022-09-03 19:40:58,652] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 284 [default3]:[2022-09-03 19:40:59,931] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 3 [default0]:[2022-09-03 19:41:01,422] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 0 [default0]:could not find arguments in the checkpoint ... [default0]: checkpoint version 3.0 [default5]:[2022-09-03 19:41:02,198] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 285 [default1]:[2022-09-03 19:41:03,073] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 1 [default2]:[2022-09-03 19:41:04,001] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 2 [default6]:[2022-09-03 19:41:04,908] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 286 [default0]: successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq at iteration 0 [default7]:[2022-09-03 19:41:04,908] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 287 [default7]:time (ms) | load-checkpoint: 26347.62 [default0]:/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/utils.py:365: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings [default0]: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") [default0]:estimated model parameters: 258.958393344 [default0]:estimated model parameters without embeddings: 0.002064384 [default0]:[after model, optimizer, and learning rate scheduler are built] datetime: 2022-09-03 19:41:04 [default0]:> building train, validation, and test datasets ... [default0]: > datasets target sizes (minimum size): [default0]: train: 6348800 [default0]: validation: 26624 [default0]: test: 2048 [default0]:> building train, validation, and test datasets for T0 ... [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.149263 seconds [default0]: number of documents: 90897616 [default0]: > dataset split: [default0]: train: [default0]: document indices in [0, 90897616) total of 90897616 documents [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.035820 seconds [default0]: number of documents: 90897616 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.002736 seconds [default0]: number of documents: 90897616 [default0]: > loading doc-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_batch_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_shuffle_idx.npy [default0]: loaded indexed file in 0.033 seconds [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.072780 seconds [default0]: number of documents: 15234080 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [14472376, 15234080) total of 761704 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_885ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_885ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_885ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.062 seconds [default0]: total number of samples: 221750 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.066284 seconds [default0]: number of documents: 6142390 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [5835270, 6142390) total of 307120 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_301ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_301ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_301ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.064 seconds [default0]: total number of samples: 136143 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.075112 seconds [default0]: number of documents: 26176998 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [24868148, 26176998) total of 1308850 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_3486ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_3486ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_3486ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.089 seconds [default0]: total number of samples: 432311 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.053923 seconds [default0]: number of documents: 20844665 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [19802432, 20844665) total of 1042233 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_5933ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_5933ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_5933ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.103 seconds [default0]: total number of samples: 521545 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.066115 seconds [default0]: number of documents: 67005817 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [63655526, 67005817) total of 3350291 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_2855ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_2855ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_2855ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.137 seconds [default0]: total number of samples: 1740321 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.091343 seconds [default0]: number of documents: 5149795 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4892305, 5149795) total of 257490 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_42ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_42ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_42ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.050 seconds [default0]: total number of samples: 26370 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.175138 seconds [default0]: number of documents: 58847091 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [55904736, 58847091) total of 2942355 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_3493ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_3493ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_3493ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.143 seconds [default0]: total number of samples: 1458654 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.066705 seconds [default0]: number of documents: 12514253 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11888540, 12514253) total of 625713 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_293ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_293ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_293ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.030 seconds [default0]: total number of samples: 134071 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.045432 seconds [default0]: number of documents: 180608 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [171578, 180608) total of 9030 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_3ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_3ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_3ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.009 seconds [default0]: total number of samples: 2501 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.059496 seconds [default0]: number of documents: 12303134 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11687977, 12303134) total of 615157 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_147ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_147ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_147ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.058 seconds [default0]: total number of samples: 157244 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.080695 seconds [default0]: number of documents: 2033057 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1931404, 2033057) total of 101653 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_11ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_11ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_11ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.029 seconds [default0]: total number of samples: 20517 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.153609 seconds [default0]: number of documents: 26793553 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [25453875, 26793553) total of 1339678 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_200ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_200ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_200ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.061 seconds [default0]: total number of samples: 101502 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.089554 seconds [default0]: number of documents: 3155990 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2998190, 3155990) total of 157800 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_17ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_17ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_17ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.011 seconds [default0]: total number of samples: 44182 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.091880 seconds [default0]: number of documents: 6692522 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [6357896, 6692522) total of 334626 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_28ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_28ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_28ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.057 seconds [default0]: total number of samples: 47613 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.044030 seconds [default0]: number of documents: 3017261 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2866398, 3017261) total of 150863 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.042 seconds [default0]: total number of samples: 29298 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.104549 seconds [default0]: number of documents: 3648041 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [3465639, 3648041) total of 182402 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_18ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_18ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_18ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.042 seconds [default0]: total number of samples: 5659 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.077596 seconds [default0]: number of documents: 4327282 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4110918, 4327282) total of 216364 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_10ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_10ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_10ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.047 seconds [default0]: total number of samples: 12423 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.116683 seconds [default0]: number of documents: 2698896 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2563951, 2698896) total of 134945 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.006 seconds [default0]: total number of samples: 19133 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.076133 seconds [default0]: number of documents: 12767593 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [12129213, 12767593) total of 638380 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_57ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_57ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_57ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.076 seconds [default0]: total number of samples: 87928 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.068562 seconds [default0]: number of documents: 4342323 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4125207, 4342323) total of 217116 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_25ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_25ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_25ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.043 seconds [default0]: total number of samples: 69780 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.108854 seconds [default0]: number of documents: 3022722 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2871586, 3022722) total of 151136 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_34ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_34ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_34ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.009 seconds [default0]: total number of samples: 22532 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.070163 seconds [default0]: number of documents: 1162568 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1104440, 1162568) total of 58128 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_9ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_9ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_9ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.007 seconds [default0]: total number of samples: 1608 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.089822 seconds [default0]: number of documents: 55294645 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [52529913, 55294645) total of 2764732 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_2178ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_2178ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_2178ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.069 seconds [default0]: total number of samples: 690621 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.055411 seconds [default0]: number of documents: 44855616 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [42612835, 44855616) total of 2242781 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_1480ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_1480ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_1480ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.070 seconds [default0]: total number of samples: 468689 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.079060 seconds [default0]: number of documents: 31969891 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [30371396, 31969891) total of 1598495 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_1326ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_1326ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_1326ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.063 seconds [default0]: total number of samples: 497625 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.070895 seconds [default0]: number of documents: 34110375 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [32404856, 34110375) total of 1705519 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_659ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_659ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_659ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.075 seconds [default0]: total number of samples: 125120 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.159159 seconds [default0]: number of documents: 43761623 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [41573542, 43761623) total of 2188081 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_3236ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_3236ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_3236ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.123 seconds [default0]: total number of samples: 1010592 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.052129 seconds [default0]: number of documents: 197602 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [187722, 197602) total of 9880 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.009 seconds [default0]: total number of samples: 4451 [default0]: total number of epochs: 1 [default0]:> building indices for blendable datasets ... [default0]: > sample ratios: [default0]: dataset 0, input: 0.0330676, achieved: 0.0330676 [default0]: dataset 1, input: 0.0112421, achieved: 0.0112421 [default0]: dataset 2, input: 0.130272, achieved: 0.130272 [default0]: dataset 3, input: 0.221712, achieved: 0.221712 [default0]: dataset 4, input: 0.106678, achieved: 0.106678 [default0]: dataset 5, input: 0.00155951, achieved: 0.00155955 [default0]: dataset 6, input: 0.13054, achieved: 0.13054 [default0]: dataset 7, input: 0.010918, achieved: 0.0109181 [default0]: dataset 8, input: 0.000110214, achieved: 0.000110257 [default0]: dataset 9, input: 0.00549238, achieved: 0.00549235 [default0]: dataset 10, input: 0.000402122, achieved: 0.000402094 [default0]: dataset 11, input: 0.00747007, achieved: 0.00747007 [default0]: dataset 12, input: 0.000619047, achieved: 0.000619024 [default0]: dataset 13, input: 0.00103353, achieved: 0.0010336 [default0]: dataset 14, input: 0.000501201, achieved: 0.000501226 [default0]: dataset 15, input: 0.000667277, achieved: 0.000667231 [default0]: dataset 16, input: 0.000359281, achieved: 0.000359326 [default0]: dataset 17, input: 0.000508443, achieved: 0.000508519 [default0]: dataset 18, input: 0.00211373, achieved: 0.0021138 [default0]: dataset 19, input: 0.000912995, achieved: 0.000912961 [default0]: dataset 20, input: 0.00124543, achieved: 0.00124546 [default0]: dataset 21, input: 0.000315887, achieved: 0.00031594 [default0]: dataset 22, input: 0.0813721, achieved: 0.0813721 [default0]: dataset 23, input: 0.0552939, achieved: 0.0552939 [default0]: dataset 24, input: 0.0495415, achieved: 0.0495414 [default0]: dataset 25, input: 0.0246164, achieved: 0.0246163 [default0]: dataset 26, input: 0.120917, achieved: 0.120917 [default0]: dataset 27, input: 0.000517703, achieved: 0.000517666 [default0]:> elapsed time for building blendable dataset indices: 0.33 (sec) [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.271482 seconds [default0]: number of documents: 2940097 [default0]: > dataset split: [default0]: valid: [default0]: document indices in [0, 2940097) total of 2940097 documents [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003456 seconds [default0]: number of documents: 2940097 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003800 seconds [default0]: number of documents: 2940097 [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]:Skipping sample id=2746508. Maximum sequence length: 2049, sample length: 3712 [default0]:Skipping sample id=2498573. Maximum sequence length: 2049, sample length: 3115 [default0]:Skipping sample id=2730344. Maximum sequence length: 2049, sample length: 3811 [default0]:Skipping sample id=2750301. Maximum sequence length: 2049, sample length: 3819 [default0]:Skipping sample id=2731299. Maximum sequence length: 2049, sample length: 2986 [default0]:Skipping sample id=2714242. Maximum sequence length: 2049, sample length: 2969 [default0]:Skipping sample id=2713924. Maximum sequence length: 2049, sample length: 3505 [default0]:Skipping sample id=2747869. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2753271. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2711240. Maximum sequence length: 2049, sample length: 2965 [default0]:Skipping sample id=2744723. Maximum sequence length: 2049, sample length: 5812 [default0]:Skipping sample id=2479868. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2731521. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2495015. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2744436. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2752411. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2742577. Maximum sequence length: 2049, sample length: 3074 [default0]:Skipping sample id=2469193. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2752482. Maximum sequence length: 2049, sample length: 3024 [default0]:Skipping sample id=2742212. Maximum sequence length: 2049, sample length: 3349 [default0]:Skipping sample id=2734408. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2736213. Maximum sequence length: 2049, sample length: 2712 [default0]:Skipping sample id=2733415. Maximum sequence length: 2049, sample length: 4316 [default0]:Skipping sample id=2729647. Maximum sequence length: 2049, sample length: 3484 [default0]:Skipping sample id=2755482. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2738429. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2712050. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2751455. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2755865. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2733640. Maximum sequence length: 2049, sample length: 4361 [default0]:Skipping sample id=2734447. Maximum sequence length: 2049, sample length: 3508 [default0]:Skipping sample id=2756342. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2754189. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2730048. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2751878. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2738005. Maximum sequence length: 2049, sample length: 4864 [default0]:Skipping sample id=2743100. Maximum sequence length: 2049, sample length: 3129 [default0]:Skipping sample id=2713280. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2712593. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2723920. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2722898. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2725270. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2750860. Maximum sequence length: 2049, sample length: 3712 [default0]:Skipping sample id=2750574. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2736057. Maximum sequence length: 2049, sample length: 3308 [default0]:Skipping sample id=2469090. Maximum sequence length: 2049, sample length: 3322 [default0]:Skipping sample id=2717097. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2746315. Maximum sequence length: 2049, sample length: 3091 [default0]:Skipping sample id=2745382. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2754173. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2752874. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2725835. Maximum sequence length: 2049, sample length: 2644 [default0]:Skipping sample id=2721364. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2745381. Maximum sequence length: 2049, sample length: 4733 [default0]:Skipping sample id=2493897. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2718904. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2753262. Maximum sequence length: 2049, sample length: 2646 [default0]:Skipping sample id=2714107. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2737530. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2752402. Maximum sequence length: 2049, sample length: 3011 [default0]:Skipping sample id=2726763. Maximum sequence length: 2049, sample length: 2803 [default0]:Skipping sample id=2746971. Maximum sequence length: 2049, sample length: 3805 [default0]:Skipping sample id=2734931. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2739617. Maximum sequence length: 2049, sample length: 3610 [default0]:Skipping sample id=2711249. Maximum sequence length: 2049, sample length: 5201 [default0]:Skipping sample id=2732088. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2731765. Maximum sequence length: 2049, sample length: 5191 [default0]:Skipping sample id=2746246. Maximum sequence length: 2049, sample length: 4331 [default0]:Skipping sample id=2755992. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2746176. Maximum sequence length: 2049, sample length: 3577 [default0]:Skipping sample id=2747268. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2716871. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2719001. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2721873. Maximum sequence length: 2049, sample length: 4014 [default0]:Skipping sample id=2733691. Maximum sequence length: 2049, sample length: 5665 [default0]:Skipping sample id=2725866. Maximum sequence length: 2049, sample length: 3567 [default0]:Skipping sample id=2739505. Maximum sequence length: 2049, sample length: 5103 [default0]:Skipping sample id=2724619. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2487536. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2724911. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2734427. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2752875. Maximum sequence length: 2049, sample length: 3691 [default0]:Skipping sample id=2489826. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2724000. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2478125. Maximum sequence length: 2049, sample length: 3007 [default0]:Skipping sample id=2711255. Maximum sequence length: 2049, sample length: 5086 [default0]:Skipping sample id=2716183. Maximum sequence length: 2049, sample length: 3173 [default0]:Skipping sample id=2741528. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2712664. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2755045. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2486058. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2733686. Maximum sequence length: 2049, sample length: 2649 [default0]:Skipping sample id=2732603. Maximum sequence length: 2049, sample length: 4322 [default0]:Skipping sample id=2750972. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2715768. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2748551. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2470068. Maximum sequence length: 2049, sample length: 3674 [default0]:Skipping sample id=2752649. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2746093. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2744785. Maximum sequence length: 2049, sample length: 4249 [default0]:Skipping sample id=2489016. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2725618. Maximum sequence length: 2049, sample length: 2824 [default0]:Skipping sample id=2482421. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2755842. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2744431. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2467294. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2732906. Maximum sequence length: 2049, sample length: 2677 [default0]:Skipping sample id=2741281. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2719160. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2713897. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2719257. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2716249. Maximum sequence length: 2049, sample length: 2712 [default0]:Skipping sample id=2710979. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2734690. Maximum sequence length: 2049, sample length: 3504 [default0]:Skipping sample id=2488275. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2742821. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2720099. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2734547. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2725360. Maximum sequence length: 2049, sample length: 4311 [default0]:Skipping sample id=2478326. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2713784. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2497463. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2491575. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2752318. Maximum sequence length: 2049, sample length: 4361 [default0]:Skipping sample id=2724041. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2743099. Maximum sequence length: 2049, sample length: 3668 [default0]:Skipping sample id=2755037. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2721473. Maximum sequence length: 2049, sample length: 7336 [default0]:Skipping sample id=2720033. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2738900. Maximum sequence length: 2049, sample length: 3061 [default0]:Skipping sample id=2754346. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2722625. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2747667. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2755872. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2741834. Maximum sequence length: 2049, sample length: 4643 [default0]:Skipping sample id=2751052. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2739708. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2717021. Maximum sequence length: 2049, sample length: 3099 [default0]:Skipping sample id=2711663. Maximum sequence length: 2049, sample length: 4084 [default0]:Skipping sample id=2727631. Maximum sequence length: 2049, sample length: 2988 [default0]:Skipping sample id=2721033. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2747844. Maximum sequence length: 2049, sample length: 5140 [default0]:Skipping sample id=2716041. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2749654. Maximum sequence length: 2049, sample length: 5560 [default0]:Skipping sample id=2756437. Maximum sequence length: 2049, sample length: 2768 [default0]:Skipping sample id=2733702. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2735027. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2723513. Maximum sequence length: 2049, sample length: 3198 [default0]:Skipping sample id=2741387. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2746270. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2756435. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2735511. Maximum sequence length: 2049, sample length: 2865 [default0]:Skipping sample id=2748802. Maximum sequence length: 2049, sample length: 4206 [default0]:Skipping sample id=2716835. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2724192. Maximum sequence length: 2049, sample length: 2471 [default0]:Skipping sample id=2746200. Maximum sequence length: 2049, sample length: 3992 [default0]:Skipping sample id=2728036. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2714333. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2742804. Maximum sequence length: 2049, sample length: 14222 [default0]:Skipping sample id=2718936. Maximum sequence length: 2049, sample length: 4294 [default0]:Skipping sample id=2468935. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2712434. Maximum sequence length: 2049, sample length: 3290 [default0]:Skipping sample id=2749277. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2753119. Maximum sequence length: 2049, sample length: 4222 [default0]:Skipping sample id=2727559. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2745987. Maximum sequence length: 2049, sample length: 3572 [default0]:Skipping sample id=2756825. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2746239. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2746695. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2732577. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2731842. Maximum sequence length: 2049, sample length: 2714 [default0]:Skipping sample id=2726472. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2755820. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2749525. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2752291. Maximum sequence length: 2049, sample length: 4363 [default0]:Skipping sample id=2713473. Maximum sequence length: 2049, sample length: 3337 [default0]:Skipping sample id=2737713. Maximum sequence length: 2049, sample length: 3926 [default0]:Skipping sample id=2714772. Maximum sequence length: 2049, sample length: 2557 [default0]:Skipping sample id=2728792. Maximum sequence length: 2049, sample length: 3561 [default0]:Skipping sample id=2731375. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2715919. Maximum sequence length: 2049, sample length: 2950 [default0]:Skipping sample id=2743897. Maximum sequence length: 2049, sample length: 3297 [default0]:Skipping sample id=2739530. Maximum sequence length: 2049, sample length: 4011 [default0]:Skipping sample id=2740961. Maximum sequence length: 2049, sample length: 4002 [default0]:Skipping sample id=2467971. Maximum sequence length: 2049, sample length: 2737 [default0]:Skipping sample id=2726333. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2721807. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2719944. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2466897. Maximum sequence length: 2049, sample length: 2689 [default0]:Skipping sample id=2741291. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2711878. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2739494. Maximum sequence length: 2049, sample length: 2590 [default0]:Skipping sample id=2716243. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2711885. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2731379. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2744541. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2755330. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2718262. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2734950. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2753633. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2723753. Maximum sequence length: 2049, sample length: 3766 [default0]:Skipping sample id=2755671. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2717209. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2744874. Maximum sequence length: 2049, sample length: 3853 [default0]:Skipping sample id=2736498. Maximum sequence length: 2049, sample length: 2862 [default0]:Skipping sample id=2740560. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2721170. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2740400. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2748758. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2488096. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2714475. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2752343. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2725039. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2478547. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2725938. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2747338. Maximum sequence length: 2049, sample length: 3682 [default0]:Skipping sample id=2737078. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2749397. Maximum sequence length: 2049, sample length: 3263 [default0]:Skipping sample id=2711434. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2751562. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2726423. Maximum sequence length: 2049, sample length: 2963 [default0]:Skipping sample id=2726262. Maximum sequence length: 2049, sample length: 4221 [default0]:Skipping sample id=2714421. Maximum sequence length: 2049, sample length: 3589 [default0]:Skipping sample id=2728738. Maximum sequence length: 2049, sample length: 3695 [default0]:Skipping sample id=2720767. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2493206. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2723882. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2714018. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2737867. Maximum sequence length: 2049, sample length: 3107 [default0]:Skipping sample id=2728187. Maximum sequence length: 2049, sample length: 3177 [default0]:Skipping sample id=2737217. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2740818. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2720440. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2739938. Maximum sequence length: 2049, sample length: 2810 [default0]:Skipping sample id=2716197. Maximum sequence length: 2049, sample length: 4041 [default0]:Skipping sample id=2718729. Maximum sequence length: 2049, sample length: 4761 [default0]:Skipping sample id=2731164. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2719723. Maximum sequence length: 2049, sample length: 3119 [default0]:Skipping sample id=2733238. Maximum sequence length: 2049, sample length: 3312 [default0]:Skipping sample id=2749966. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2746151. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2755573. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2738283. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2748917. Maximum sequence length: 2049, sample length: 3308 [default0]:Skipping sample id=2711085. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2726385. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2754944. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2751501. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2737134. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2736970. Maximum sequence length: 2049, sample length: 4163 [default0]:Skipping sample id=2724661. Maximum sequence length: 2049, sample length: 2900 [default0]:Skipping sample id=2711158. Maximum sequence length: 2049, sample length: 4910 [default0]:Skipping sample id=2490005. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2732582. Maximum sequence length: 2049, sample length: 6428 [default0]:Skipping sample id=2715310. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2743660. Maximum sequence length: 2049, sample length: 3509 [default0]:Skipping sample id=2739633. Maximum sequence length: 2049, sample length: 2586 [default0]:Skipping sample id=2730071. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2722523. Maximum sequence length: 2049, sample length: 3589 [default0]:Skipping sample id=2748360. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2715433. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2495000. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2466495. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2743974. Maximum sequence length: 2049, sample length: 2809 [default0]:Skipping sample id=2736882. Maximum sequence length: 2049, sample length: 3151 [default0]:Skipping sample id=2731089. Maximum sequence length: 2049, sample length: 3549 [default0]:Skipping sample id=2748803. Maximum sequence length: 2049, sample length: 2434 [default0]:Skipping sample id=2725852. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2726238. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2736751. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2737303. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2715520. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2748419. Maximum sequence length: 2049, sample length: 2707 [default0]:Skipping sample id=2726459. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2490518. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2722331. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2469787. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2727266. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2722045. Maximum sequence length: 2049, sample length: 6941 [default0]:Skipping sample id=2729211. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2745005. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2734178. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2716281. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2743935. Maximum sequence length: 2049, sample length: 3546 [default0]:Skipping sample id=2733564. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2739826. Maximum sequence length: 2049, sample length: 5858 [default0]:Skipping sample id=2731954. Maximum sequence length: 2049, sample length: 3372 [default0]:Skipping sample id=2723781. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2740945. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2746763. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2719046. Maximum sequence length: 2049, sample length: 3999 [default0]:Skipping sample id=2731490. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2752917. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2728846. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2734978. Maximum sequence length: 2049, sample length: 5155 [default0]:Skipping sample id=2722545. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2746531. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2726665. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2479089. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2491119. Maximum sequence length: 2049, sample length: 3524 [default0]:Skipping sample id=2711195. Maximum sequence length: 2049, sample length: 4638 [default0]:Skipping sample id=2738158. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2485304. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2731495. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2741396. Maximum sequence length: 2049, sample length: 3789 [default0]:Skipping sample id=2487956. Maximum sequence length: 2049, sample length: 2744 [default0]:Skipping sample id=2716087. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2714541. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2725700. Maximum sequence length: 2049, sample length: 3586 [default0]:Skipping sample id=2748894. Maximum sequence length: 2049, sample length: 2669 [default0]:Skipping sample id=2743877. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2735138. Maximum sequence length: 2049, sample length: 4523 [default0]:Skipping sample id=2740267. Maximum sequence length: 2049, sample length: 4398 [default0]:Skipping sample id=2723599. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2724118. Maximum sequence length: 2049, sample length: 5380 [default0]:Skipping sample id=2713114. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2738034. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2723798. Maximum sequence length: 2049, sample length: 2999 [default0]:Skipping sample id=2747526. Maximum sequence length: 2049, sample length: 3324 [default0]:Skipping sample id=2721454. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2749172. Maximum sequence length: 2049, sample length: 3284 [default0]:Skipping sample id=2756777. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2715928. Maximum sequence length: 2049, sample length: 4567 [default0]:Skipping sample id=2750909. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2737223. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2735659. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2736693. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2716940. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2716534. Maximum sequence length: 2049, sample length: 6073 [default0]:Skipping sample id=2734173. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2738079. Maximum sequence length: 2049, sample length: 3230 [default0]:Skipping sample id=2740490. Maximum sequence length: 2049, sample length: 3572 [default0]:Skipping sample id=2755092. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2754538. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2717957. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2490293. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2746410. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2754330. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2735128. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2718299. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2746867. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2736262. Maximum sequence length: 2049, sample length: 3218 [default0]:Skipping sample id=2712285. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2736760. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2730632. Maximum sequence length: 2049, sample length: 3799 [default0]:Skipping sample id=2727295. Maximum sequence length: 2049, sample length: 4453 [default0]:Skipping sample id=2744513. Maximum sequence length: 2049, sample length: 3995 [default0]:Skipping sample id=2711486. Maximum sequence length: 2049, sample length: 5187 [default0]:Skipping sample id=2743696. Maximum sequence length: 2049, sample length: 4028 [default0]:Skipping sample id=2745759. Maximum sequence length: 2049, sample length: 3273 [default0]:Skipping sample id=2713780. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2731262. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2737368. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2743263. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2727806. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2724346. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2756583. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2749455. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2722827. Maximum sequence length: 2049, sample length: 3608 [default0]:Skipping sample id=2722082. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2712074. Maximum sequence length: 2049, sample length: 3750 [default0]:Skipping sample id=2749920. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2731576. Maximum sequence length: 2049, sample length: 4813 [default0]:Skipping sample id=2712017. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2720469. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2730955. Maximum sequence length: 2049, sample length: 4164 [default0]:Skipping sample id=2481119. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2718441. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2744088. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2713014. Maximum sequence length: 2049, sample length: 3408 [default0]:Skipping sample id=2726700. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2748380. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2716492. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2716735. Maximum sequence length: 2049, sample length: 3330 [default0]:Skipping sample id=2712897. Maximum sequence length: 2049, sample length: 4249 [default0]:Skipping sample id=2734371. Maximum sequence length: 2049, sample length: 2952 [default0]:Skipping sample id=2726752. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2725034. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2730988. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2715881. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2712119. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2715370. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2747867. Maximum sequence length: 2049, sample length: 3954 [default0]:Skipping sample id=2753225. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2741523. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2716413. Maximum sequence length: 2049, sample length: 5106 [default0]:Skipping sample id=2742153. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2738240. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2751731. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2753105. Maximum sequence length: 2049, sample length: 3137 [default0]:Skipping sample id=2732115. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2467110. Maximum sequence length: 2049, sample length: 3201 [default0]:Skipping sample id=2719515. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2753410. Maximum sequence length: 2049, sample length: 4420 [default0]:Skipping sample id=2752190. Maximum sequence length: 2049, sample length: 4704 [default0]:Skipping sample id=2735522. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2716695. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2727174. Maximum sequence length: 2049, sample length: 6924 [default0]:Skipping sample id=2720755. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2730271. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2716579. Maximum sequence length: 2049, sample length: 3113 [default0]:Skipping sample id=2477061. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2743507. Maximum sequence length: 2049, sample length: 3042 [default0]:Skipping sample id=2712947. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2744568. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2712214. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2722477. Maximum sequence length: 2049, sample length: 3009 [default0]:Skipping sample id=2479392. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2731753. Maximum sequence length: 2049, sample length: 3287 [default0]:Skipping sample id=2737831. Maximum sequence length: 2049, sample length: 7071 [default0]:Skipping sample id=2729059. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2724386. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2738846. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2749084. Maximum sequence length: 2049, sample length: 4258 [default0]:Skipping sample id=2490372. Maximum sequence length: 2049, sample length: 3441 [default0]:Skipping sample id=2746807. Maximum sequence length: 2049, sample length: 2120 [default0]:Skipping sample id=2737677. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2750341. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2752974. Maximum sequence length: 2049, sample length: 3120 [default0]:Skipping sample id=2479402. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2735039. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2724168. Maximum sequence length: 2049, sample length: 4098 [default0]:Skipping sample id=2751020. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2740223. Maximum sequence length: 2049, sample length: 3661 [default0]:Skipping sample id=2717641. Maximum sequence length: 2049, sample length: 4386 [default0]:Skipping sample id=2717682. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2477980. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2714369. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2723541. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2726311. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2732530. Maximum sequence length: 2049, sample length: 3437 [default0]:Skipping sample id=2724966. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2712245. Maximum sequence length: 2049, sample length: 3476 [default0]:Skipping sample id=2747273. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2495968. Maximum sequence length: 2049, sample length: 3468 [default0]:Skipping sample id=2743916. Maximum sequence length: 2049, sample length: 3293 [default0]:Skipping sample id=2745895. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2757101. Maximum sequence length: 2049, sample length: 4812 [default0]:Skipping sample id=2744222. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2736695. Maximum sequence length: 2049, sample length: 2709 [default0]:Skipping sample id=2756038. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2733677. Maximum sequence length: 2049, sample length: 3668 [default0]:Skipping sample id=2729808. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2716428. Maximum sequence length: 2049, sample length: 2968 [default0]:Skipping sample id=2718218. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2479194. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2752068. Maximum sequence length: 2049, sample length: 2757 [default0]:Skipping sample id=2740865. Maximum sequence length: 2049, sample length: 5861 [default0]:Skipping sample id=2733274. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2745891. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2722012. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2737850. Maximum sequence length: 2049, sample length: 4592 [default0]:Skipping sample id=2744439. Maximum sequence length: 2049, sample length: 3546 [default0]:Skipping sample id=2723932. Maximum sequence length: 2049, sample length: 6817 [default0]:Skipping sample id=2744103. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2744685. Maximum sequence length: 2049, sample length: 2515 [default0]:Skipping sample id=2469328. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2478257. Maximum sequence length: 2049, sample length: 3589 [default0]:Skipping sample id=2735944. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2720783. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2714504. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2721256. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2747228. Maximum sequence length: 2049, sample length: 4170 [default0]:Skipping sample id=2755736. Maximum sequence length: 2049, sample length: 3744 [default0]:Skipping sample id=2756377. Maximum sequence length: 2049, sample length: 2886 [default0]:Skipping sample id=2747561. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2739929. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2740116. Maximum sequence length: 2049, sample length: 3327 [default0]:Skipping sample id=2746743. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2711286. Maximum sequence length: 2049, sample length: 4514 [default0]:Skipping sample id=2725895. Maximum sequence length: 2049, sample length: 3180 [default0]:Skipping sample id=2716955. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2730602. Maximum sequence length: 2049, sample length: 3424 [default0]:Skipping sample id=2725173. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2742060. Maximum sequence length: 2049, sample length: 4356 [default0]:Skipping sample id=2483708. Maximum sequence length: 2049, sample length: 3895 [default0]:Skipping sample id=2730393. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2739226. Maximum sequence length: 2049, sample length: 4132 [default0]:Skipping sample id=2723440. Maximum sequence length: 2049, sample length: 4037 [default0]:Skipping sample id=2715088. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2743795. Maximum sequence length: 2049, sample length: 6215 [default0]:Skipping sample id=2495096. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2735448. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2712727. Maximum sequence length: 2049, sample length: 2861 [default0]:Skipping sample id=2488945. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2741118. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2712828. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2736005. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2745445. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2746441. Maximum sequence length: 2049, sample length: 4117 [default0]:Skipping sample id=2744969. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2731393. Maximum sequence length: 2049, sample length: 2421 [default0]:Skipping sample id=2752675. Maximum sequence length: 2049, sample length: 3974 [default0]:Skipping sample id=2714371. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2750822. Maximum sequence length: 2049, sample length: 3895 [default0]:Skipping sample id=2753060. Maximum sequence length: 2049, sample length: 3062 [default0]:Skipping sample id=2478632. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2720791. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2748338. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2726297. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2499391. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2725061. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2712344. Maximum sequence length: 2049, sample length: 4919 [default0]:Skipping sample id=2750201. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2728672. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2485497. Maximum sequence length: 2049, sample length: 3533 [default0]:Skipping sample id=2729316. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2488057. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2466937. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2718370. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2467128. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2741683. Maximum sequence length: 2049, sample length: 3493 [default0]:Skipping sample id=2752147. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2718721. Maximum sequence length: 2049, sample length: 3258 [default0]:Skipping sample id=2756101. Maximum sequence length: 2049, sample length: 2737 [default0]:Skipping sample id=2742486. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2724743. Maximum sequence length: 2049, sample length: 4431 [default0]:Skipping sample id=2720549. Maximum sequence length: 2049, sample length: 4601 [default0]:Skipping sample id=2726222. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2725855. Maximum sequence length: 2049, sample length: 5264 [default0]:Skipping sample id=2726182. Maximum sequence length: 2049, sample length: 3231 [default0]:Skipping sample id=2748366. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2743634. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2738134. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2727214. Maximum sequence length: 2049, sample length: 3475 [default0]:Skipping sample id=2734043. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2733230. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2745494. Maximum sequence length: 2049, sample length: 2808 [default0]:Skipping sample id=2741196. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2711174. Maximum sequence length: 2049, sample length: 3627 [default0]:Skipping sample id=2478694. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2743164. Maximum sequence length: 2049, sample length: 2506 [default0]:Skipping sample id=2725000. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2489295. Maximum sequence length: 2049, sample length: 2562 [default0]:Skipping sample id=2480436. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2750429. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2750468. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2744705. Maximum sequence length: 2049, sample length: 3465 [default0]:Skipping sample id=2725369. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2752538. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2737764. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2722575. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2721940. Maximum sequence length: 2049, sample length: 3741 [default0]:Skipping sample id=2734623. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2489511. Maximum sequence length: 2049, sample length: 2838 [default0]:Skipping sample id=2724150. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2730501. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2494721. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2755447. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2724521. Maximum sequence length: 2049, sample length: 6646 [default0]:Skipping sample id=2487535. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2743019. Maximum sequence length: 2049, sample length: 3388 [default0]:Skipping sample id=2750690. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2719737. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2728377. Maximum sequence length: 2049, sample length: 3028 [default0]:Skipping sample id=2484268. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2494655. Maximum sequence length: 2049, sample length: 2568 [default0]:Skipping sample id=2751596. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2717700. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2481864. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2745909. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2734641. Maximum sequence length: 2049, sample length: 4590 [default0]:Skipping sample id=2741151. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2716145. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2481031. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2751025. Maximum sequence length: 2049, sample length: 3446 [default0]:Skipping sample id=2726400. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2495956. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2477673. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2492275. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2727925. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2718382. Maximum sequence length: 2049, sample length: 3006 [default0]:Skipping sample id=2717435. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2468729. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2730455. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2468295. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2715989. Maximum sequence length: 2049, sample length: 2891 [default0]:Skipping sample id=2756623. Maximum sequence length: 2049, sample length: 4173 [default0]:Skipping sample id=2491422. Maximum sequence length: 2049, sample length: 3381 [default0]:Skipping sample id=2742380. Maximum sequence length: 2049, sample length: 3247 [default0]:Skipping sample id=2728722. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2752114. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2717352. Maximum sequence length: 2049, sample length: 5458 [default0]:Skipping sample id=2740298. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2721138. Maximum sequence length: 2049, sample length: 4893 [default0]:Skipping sample id=2755549. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2723828. Maximum sequence length: 2049, sample length: 3985 [default0]:Skipping sample id=2716370. Maximum sequence length: 2049, sample length: 3310 [default0]:Skipping sample id=2712032. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2734418. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2743290. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2738567. Maximum sequence length: 2049, sample length: 4028 [default0]:Skipping sample id=2494275. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2751796. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2495219. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2722416. Maximum sequence length: 2049, sample length: 3813 [default0]:Skipping sample id=2751005. Maximum sequence length: 2049, sample length: 4131 [default0]:Skipping sample id=2745672. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2731425. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2712739. Maximum sequence length: 2049, sample length: 6302 [default0]:Skipping sample id=2715530. Maximum sequence length: 2049, sample length: 3317 [default0]:Skipping sample id=2743036. Maximum sequence length: 2049, sample length: 4152 [default0]:Skipping sample id=2751323. Maximum sequence length: 2049, sample length: 4504 [default0]:Skipping sample id=2488813. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2754313. Maximum sequence length: 2049, sample length: 3743 [default0]:Skipping sample id=2483326. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2756975. Maximum sequence length: 2049, sample length: 6245 [default0]:Skipping sample id=2731835. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2754258. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2732695. Maximum sequence length: 2049, sample length: 2780 [default0]:Skipping sample id=2722472. Maximum sequence length: 2049, sample length: 3124 [default0]:Skipping sample id=2732544. Maximum sequence length: 2049, sample length: 3161 [default0]:Skipping sample id=2721782. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2750549. Maximum sequence length: 2049, sample length: 3902 [default0]:Skipping sample id=2493193. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2754100. Maximum sequence length: 2049, sample length: 3827 [default0]:Skipping sample id=2732616. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2727731. Maximum sequence length: 2049, sample length: 4909 [default0]:Skipping sample id=2751119. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2712553. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2746919. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2498780. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2712024. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2748589. Maximum sequence length: 2049, sample length: 2677 [default0]:Skipping sample id=2757052. Maximum sequence length: 2049, sample length: 4237 [default0]:Skipping sample id=2739519. Maximum sequence length: 2049, sample length: 4443 [default0]:Skipping sample id=2711671. Maximum sequence length: 2049, sample length: 3113 [default0]:Skipping sample id=2733377. Maximum sequence length: 2049, sample length: 3127 [default0]:Skipping sample id=2735862. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2496523. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2490587. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2732179. Maximum sequence length: 2049, sample length: 4074 [default0]:Skipping sample id=2492464. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2718993. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2712334. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2720772. Maximum sequence length: 2049, sample length: 3426 [default0]:Skipping sample id=2726764. Maximum sequence length: 2049, sample length: 3481 [default0]:Skipping sample id=2487470. Maximum sequence length: 2049, sample length: 3040 [default0]:Skipping sample id=2741614. Maximum sequence length: 2049, sample length: 2434 [default0]:Skipping sample id=2746571. Maximum sequence length: 2049, sample length: 3491 [default0]:Skipping sample id=2735283. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2724191. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2493755. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2742055. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2729740. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2498427. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2737868. Maximum sequence length: 2049, sample length: 5262 [default0]:Skipping sample id=2718679. Maximum sequence length: 2049, sample length: 2457 [default0]:Skipping sample id=2744611. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2485177. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2727247. Maximum sequence length: 2049, sample length: 5125 [default0]:Skipping sample id=2736341. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2748404. Maximum sequence length: 2049, sample length: 2798 [default0]:Skipping sample id=2753180. Maximum sequence length: 2049, sample length: 2765 [default0]:Skipping sample id=2468569. Maximum sequence length: 2049, sample length: 3419 [default0]:Skipping sample id=2483751. Maximum sequence length: 2049, sample length: 3613 [default0]:Skipping sample id=2725969. Maximum sequence length: 2049, sample length: 4218 [default0]:Skipping sample id=2750894. Maximum sequence length: 2049, sample length: 3084 [default0]:Skipping sample id=2717531. Maximum sequence length: 2049, sample length: 3060 [default0]:Skipping sample id=2752451. Maximum sequence length: 2049, sample length: 6528 [default0]:Skipping sample id=2727876. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2725177. Maximum sequence length: 2049, sample length: 3995 [default0]:Skipping sample id=2732635. Maximum sequence length: 2049, sample length: 2749 [default0]:Skipping sample id=2727776. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2732354. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2716388. Maximum sequence length: 2049, sample length: 4411 [default0]:Skipping sample id=2728624. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2718700. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2727023. Maximum sequence length: 2049, sample length: 3556 [default0]:Skipping sample id=2754569. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2718701. Maximum sequence length: 2049, sample length: 3407 [default0]:Skipping sample id=2735691. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2753256. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2738943. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2727982. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2754690. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2486263. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2737308. Maximum sequence length: 2049, sample length: 3993 [default0]:Skipping sample id=2746877. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2748527. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2741617. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2468273. Maximum sequence length: 2049, sample length: 2587 [default0]:Skipping sample id=2730895. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2718232. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2726074. Maximum sequence length: 2049, sample length: 3158 [default0]:Skipping sample id=2732599. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2485963. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2742956. Maximum sequence length: 2049, sample length: 3903 [default0]:Skipping sample id=2721430. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2719760. Maximum sequence length: 2049, sample length: 4084 [default0]:Skipping sample id=2479598. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2494755. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2719849. Maximum sequence length: 2049, sample length: 2860 [default0]:Skipping sample id=2753951. Maximum sequence length: 2049, sample length: 3647 [default0]:Skipping sample id=2750104. Maximum sequence length: 2049, sample length: 4605 [default0]:Skipping sample id=2716385. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2738226. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2481848. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2750380. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2757083. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2725228. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2712215. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2756774. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2726933. Maximum sequence length: 2049, sample length: 3587 [default0]:Skipping sample id=2745259. Maximum sequence length: 2049, sample length: 4290 [default0]:Skipping sample id=2722748. Maximum sequence length: 2049, sample length: 3581 [default0]:Skipping sample id=2720060. Maximum sequence length: 2049, sample length: 3436 [default0]:Skipping sample id=2716020. Maximum sequence length: 2049, sample length: 4382 [default0]:Skipping sample id=2745161. Maximum sequence length: 2049, sample length: 2695 [default0]:Skipping sample id=2717811. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2719360. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2728612. Maximum sequence length: 2049, sample length: 4948 [default0]:Skipping sample id=2755726. Maximum sequence length: 2049, sample length: 3209 [default0]:Skipping sample id=2496967. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2751520. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2729851. Maximum sequence length: 2049, sample length: 3343 [default0]:Skipping sample id=2722018. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2739434. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2744855. Maximum sequence length: 2049, sample length: 3140 [default0]:Skipping sample id=2724894. Maximum sequence length: 2049, sample length: 3818 [default0]:Skipping sample id=2711058. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2731859. Maximum sequence length: 2049, sample length: 3835 [default0]:Skipping sample id=2746854. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2466737. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2756611. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2488147. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2481619. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2721199. Maximum sequence length: 2049, sample length: 3266 [default0]:Skipping sample id=2741284. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2750529. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2477530. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2746097. Maximum sequence length: 2049, sample length: 5487 [default0]:Skipping sample id=2751936. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2749844. Maximum sequence length: 2049, sample length: 3531 [default0]:Skipping sample id=2724321. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2711470. Maximum sequence length: 2049, sample length: 4238 [default0]:Skipping sample id=2725705. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2731766. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2721098. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2747834. Maximum sequence length: 2049, sample length: 3652 [default0]:Skipping sample id=2733626. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2746751. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2726867. Maximum sequence length: 2049, sample length: 4249 [default0]:Skipping sample id=2720714. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2755686. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2736056. Maximum sequence length: 2049, sample length: 4062 [default0]:Skipping sample id=2734677. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2746862. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2725951. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2736123. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2719007. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2722166. Maximum sequence length: 2049, sample length: 4470 [default0]:Skipping sample id=2744288. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2747986. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2484447. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2741367. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2734810. Maximum sequence length: 2049, sample length: 4319 [default0]:Skipping sample id=2738651. Maximum sequence length: 2049, sample length: 4907 [default0]:Skipping sample id=2753319. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2732959. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2743863. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2755195. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2734018. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2731646. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2496288. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2736792. Maximum sequence length: 2049, sample length: 3825 [default0]:Skipping sample id=2491585. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2745932. Maximum sequence length: 2049, sample length: 5957 [default0]:Skipping sample id=2713285. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2717278. Maximum sequence length: 2049, sample length: 5291 [default0]:Skipping sample id=2721209. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2466882. Maximum sequence length: 2049, sample length: 3011 [default0]:Skipping sample id=2714700. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2493724. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2725040. Maximum sequence length: 2049, sample length: 3711 [default0]:Skipping sample id=2754226. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2711729. Maximum sequence length: 2049, sample length: 3507 [default0]:Skipping sample id=2713047. Maximum sequence length: 2049, sample length: 4182 [default0]:Skipping sample id=2719189. Maximum sequence length: 2049, sample length: 3192 [default0]:Skipping sample id=2754733. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2487391. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2467790. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2728435. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2715875. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2723940. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2734362. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2713232. Maximum sequence length: 2049, sample length: 2905 [default0]:Skipping sample id=2734701. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2714761. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2718927. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2714904. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2718913. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2736418. Maximum sequence length: 2049, sample length: 3573 [default0]:Skipping sample id=2490810. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2724696. Maximum sequence length: 2049, sample length: 3809 [default0]:Skipping sample id=2731138. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2718477. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2713748. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2721751. Maximum sequence length: 2049, sample length: 4285 [default0]:Skipping sample id=2737631. Maximum sequence length: 2049, sample length: 4934 [default0]:Skipping sample id=2716615. Maximum sequence length: 2049, sample length: 5187 [default0]:Skipping sample id=2719528. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2736603. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2733095. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2743541. Maximum sequence length: 2049, sample length: 3961 [default0]:Skipping sample id=2733748. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2752146. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2718847. Maximum sequence length: 2049, sample length: 4613 [default0]:Skipping sample id=2736520. Maximum sequence length: 2049, sample length: 4204 [default0]:Skipping sample id=2716274. Maximum sequence length: 2049, sample length: 5202 [default0]:Skipping sample id=2488043. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2748952. Maximum sequence length: 2049, sample length: 3874 [default0]:Skipping sample id=2722871. Maximum sequence length: 2049, sample length: 4570 [default0]:Skipping sample id=2745693. Maximum sequence length: 2049, sample length: 3888 [default0]:Skipping sample id=2754505. Maximum sequence length: 2049, sample length: 3271 [default0]:Skipping sample id=2722503. Maximum sequence length: 2049, sample length: 3483 [default0]:Skipping sample id=2745398. Maximum sequence length: 2049, sample length: 4198 [default0]:Skipping sample id=2736544. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2741900. Maximum sequence length: 2049, sample length: 5836 [default0]:Skipping sample id=2753585. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2727961. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2499417. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2732790. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2714125. Maximum sequence length: 2049, sample length: 4600 [default0]:Skipping sample id=2734745. Maximum sequence length: 2049, sample length: 4003 [default0]:Skipping sample id=2493618. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2744865. Maximum sequence length: 2049, sample length: 5076 [default0]:Skipping sample id=2754891. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2483482. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2715802. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2730721. Maximum sequence length: 2049, sample length: 2621 [default0]:Skipping sample id=2735816. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2481250. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2756472. Maximum sequence length: 2049, sample length: 3684 [default0]:Skipping sample id=2717090. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2719739. Maximum sequence length: 2049, sample length: 3736 [default0]:Skipping sample id=2728073. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2744766. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2732114. Maximum sequence length: 2049, sample length: 3469 [default0]:Skipping sample id=2732015. Maximum sequence length: 2049, sample length: 3282 [default0]:Skipping sample id=2744655. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2752138. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2717378. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2744531. Maximum sequence length: 2049, sample length: 5443 [default0]:Skipping sample id=2742549. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2498346. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2725380. Maximum sequence length: 2049, sample length: 3533 [default0]:Skipping sample id=2714093. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2751312. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2722627. Maximum sequence length: 2049, sample length: 3218 [default0]:Skipping sample id=2754764. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2719604. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2479250. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2737320. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2731761. Maximum sequence length: 2049, sample length: 2886 [default0]:Skipping sample id=2491112. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2468656. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2487619. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2738088. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2740927. Maximum sequence length: 2049, sample length: 2826 [default0]:Skipping sample id=2715218. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2746956. Maximum sequence length: 2049, sample length: 3573 [default0]:Skipping sample id=2725254. Maximum sequence length: 2049, sample length: 3213 [default0]:Skipping sample id=2711209. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2719916. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2744864. Maximum sequence length: 2049, sample length: 3376 [default0]:Skipping sample id=2741932. Maximum sequence length: 2049, sample length: 4699 [default0]:Skipping sample id=2732740. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2724453. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2744899. Maximum sequence length: 2049, sample length: 4521 [default0]:Skipping sample id=2710990. Maximum sequence length: 2049, sample length: 4345 [default0]:Skipping sample id=2484254. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2741319. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2741859. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2495692. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2738615. Maximum sequence length: 2049, sample length: 4819 [default0]:Skipping sample id=2714448. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2749801. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2468873. Maximum sequence length: 2049, sample length: 3335 [default0]:Skipping sample id=2481237. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2739962. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2721765. Maximum sequence length: 2049, sample length: 4698 [default0]:Skipping sample id=2714684. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2728273. Maximum sequence length: 2049, sample length: 3736 [default0]:Skipping sample id=2744009. Maximum sequence length: 2049, sample length: 4187 [default0]:Skipping sample id=2719042. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2732484. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2730820. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2749561. Maximum sequence length: 2049, sample length: 3391 [default0]:Skipping sample id=2716204. Maximum sequence length: 2049, sample length: 4782 [default0]:Skipping sample id=2736524. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2743821. Maximum sequence length: 2049, sample length: 4106 [default0]:Skipping sample id=2751767. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2721642. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2737802. Maximum sequence length: 2049, sample length: 3703 [default0]:Skipping sample id=2487605. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2731870. Maximum sequence length: 2049, sample length: 3788 [default0]:Skipping sample id=2721762. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2713671. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2733011. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2736692. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2715246. Maximum sequence length: 2049, sample length: 2565 [default0]:Skipping sample id=2466133. Maximum sequence length: 2049, sample length: 3166 [default0]:Skipping sample id=2734174. Maximum sequence length: 2049, sample length: 3487 [default0]:Skipping sample id=2482272. Maximum sequence length: 2049, sample length: 2906 [default0]:Skipping sample id=2750491. Maximum sequence length: 2049, sample length: 3108 [default0]:Skipping sample id=2746016. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2726488. Maximum sequence length: 2049, sample length: 4852 [default0]:Skipping sample id=2731529. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2719543. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2736309. Maximum sequence length: 2049, sample length: 2816 [default0]:Skipping sample id=2477464. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2752133. Maximum sequence length: 2049, sample length: 4789 [default0]:Skipping sample id=2749781. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2737909. Maximum sequence length: 2049, sample length: 3678 [default0]:Skipping sample id=2735462. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2490334. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2727198. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2753771. Maximum sequence length: 2049, sample length: 6009 [default0]:Skipping sample id=2724744. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2746117. Maximum sequence length: 2049, sample length: 3241 [default0]:Skipping sample id=2756881. Maximum sequence length: 2049, sample length: 2714 [default0]:Skipping sample id=2488278. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2735877. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2722409. Maximum sequence length: 2049, sample length: 5077 [default0]:Skipping sample id=2466492. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2726144. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2737814. Maximum sequence length: 2049, sample length: 3345 [default0]:Skipping sample id=2729430. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2744355. Maximum sequence length: 2049, sample length: 4133 [default0]:Skipping sample id=2731797. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2745734. Maximum sequence length: 2049, sample length: 3484 [default0]:Skipping sample id=2743579. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2486431. Maximum sequence length: 2049, sample length: 2703 [default0]:Skipping sample id=2730149. Maximum sequence length: 2049, sample length: 2906 [default0]:Skipping sample id=2755732. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2748452. Maximum sequence length: 2049, sample length: 4607 [default0]:Skipping sample id=2739073. Maximum sequence length: 2049, sample length: 3449 [default0]:Skipping sample id=2714995. Maximum sequence length: 2049, sample length: 2875 [default0]:Skipping sample id=2724729. Maximum sequence length: 2049, sample length: 3620 [default0]:Skipping sample id=2718914. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2744202. Maximum sequence length: 2049, sample length: 3450 [default0]:Skipping sample id=2470101. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2736569. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2712211. Maximum sequence length: 2049, sample length: 5346 [default0]:Skipping sample id=2718753. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2738075. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2725742. Maximum sequence length: 2049, sample length: 5400 [default0]:Skipping sample id=2743163. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2755316. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2750827. Maximum sequence length: 2049, sample length: 4538 [default0]:Skipping sample id=2715045. Maximum sequence length: 2049, sample length: 4589 [default0]:Skipping sample id=2480445. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2734450. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2719140. Maximum sequence length: 2049, sample length: 4370 [default0]:Skipping sample id=2722661. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2753890. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2747605. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2467083. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2716773. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2749854. Maximum sequence length: 2049, sample length: 4429 [default0]:Skipping sample id=2732331. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2483880. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2710971. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2731291. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2721354. Maximum sequence length: 2049, sample length: 3506 [default0]:Skipping sample id=2720532. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2748134. Maximum sequence length: 2049, sample length: 5334 [default0]:Skipping sample id=2744787. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2738442. Maximum sequence length: 2049, sample length: 4768 [default0]:Skipping sample id=2752158. Maximum sequence length: 2049, sample length: 5841 [default0]:Skipping sample id=2739139. Maximum sequence length: 2049, sample length: 3255 [default0]:Skipping sample id=2728654. Maximum sequence length: 2049, sample length: 4950 [default0]:Skipping sample id=2718830. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2489393. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2742775. Maximum sequence length: 2049, sample length: 2624 [default0]:Skipping sample id=2714260. Maximum sequence length: 2049, sample length: 3494 [default0]:Skipping sample id=2495955. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2737429. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2712341. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2483852. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2466786. Maximum sequence length: 2049, sample length: 4092 [default0]:Skipping sample id=2499111. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2719479. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2754183. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2752894. Maximum sequence length: 2049, sample length: 3523 [default0]:Skipping sample id=2718396. Maximum sequence length: 2049, sample length: 4960 [default0]:Skipping sample id=2729055. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2744314. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2712091. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2478428. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2737894. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2715854. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2713500. Maximum sequence length: 2049, sample length: 4245 [default0]:Skipping sample id=2734210. Maximum sequence length: 2049, sample length: 3022 [default0]:Skipping sample id=2487115. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2736133. Maximum sequence length: 2049, sample length: 2642 [default0]:Skipping sample id=2469751. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2715276. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2753715. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2751896. Maximum sequence length: 2049, sample length: 4129 [default0]:Skipping sample id=2719453. Maximum sequence length: 2049, sample length: 4186 [default0]:Skipping sample id=2749044. Maximum sequence length: 2049, sample length: 2860 [default0]:Skipping sample id=2721467. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2748800. Maximum sequence length: 2049, sample length: 4188 [default0]:Skipping sample id=2744580. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2735055. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2489841. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2734384. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2747018. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2754398. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2743440. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2753179. Maximum sequence length: 2049, sample length: 3433 [default0]:Skipping sample id=2484488. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2723093. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2714461. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2722613. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2726027. Maximum sequence length: 2049, sample length: 4369 [default0]:Skipping sample id=2748096. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2732038. Maximum sequence length: 2049, sample length: 5184 [default0]:Skipping sample id=2754086. Maximum sequence length: 2049, sample length: 3175 [default0]:Skipping sample id=2719443. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2715402. Maximum sequence length: 2049, sample length: 3819 [default0]:Skipping sample id=2739876. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2714322. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2731618. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2752003. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2727230. Maximum sequence length: 2049, sample length: 3765 [default0]:Skipping sample id=2720344. Maximum sequence length: 2049, sample length: 3434 [default0]:Skipping sample id=2740276. Maximum sequence length: 2049, sample length: 4038 [default0]:Skipping sample id=2756538. Maximum sequence length: 2049, sample length: 2982 [default0]:Skipping sample id=2712916. Maximum sequence length: 2049, sample length: 3156 [default0]:Skipping sample id=2720601. Maximum sequence length: 2049, sample length: 4501 [default0]:Skipping sample id=2478265. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2739380. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2478236. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2711784. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2736114. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2716338. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2745145. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2715722. Maximum sequence length: 2049, sample length: 2558 [default0]:Skipping sample id=2715856. Maximum sequence length: 2049, sample length: 3434 [default0]:Skipping sample id=2734636. Maximum sequence length: 2049, sample length: 3325 [default0]:Skipping sample id=2485544. Maximum sequence length: 2049, sample length: 2946 [default0]:Skipping sample id=2733761. Maximum sequence length: 2049, sample length: 3524 [default0]:Skipping sample id=2750290. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2713502. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2716369. Maximum sequence length: 2049, sample length: 2965 [default0]:Skipping sample id=2721889. Maximum sequence length: 2049, sample length: 4195 [default0]:Skipping sample id=2478252. Maximum sequence length: 2049, sample length: 2624 [default0]:Skipping sample id=2714123. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2752019. Maximum sequence length: 2049, sample length: 4119 [default0]:Skipping sample id=2725798. Maximum sequence length: 2049, sample length: 4079 [default0]:Skipping sample id=2466831. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2753700. Maximum sequence length: 2049, sample length: 3636 [default0]:Skipping sample id=2727966. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2754169. Maximum sequence length: 2049, sample length: 2580 [default0]:Skipping sample id=2720206. Maximum sequence length: 2049, sample length: 7785 [default0]:Skipping sample id=2736086. Maximum sequence length: 2049, sample length: 4013 [default0]:Skipping sample id=2718662. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2741733. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2721012. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2467292. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2497985. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2717990. Maximum sequence length: 2049, sample length: 3137 [default0]:Skipping sample id=2722981. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2727599. Maximum sequence length: 2049, sample length: 2868 [default0]:Skipping sample id=2498730. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2754009. Maximum sequence length: 2049, sample length: 3174 [default0]:Skipping sample id=2494626. Maximum sequence length: 2049, sample length: 2874 [default0]:Skipping sample id=2731829. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2725577. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2468307. Maximum sequence length: 2049, sample length: 3547 [default0]:Skipping sample id=2480994. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2729579. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2728886. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2491680. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2752090. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2722579. Maximum sequence length: 2049, sample length: 6499 [default0]:Skipping sample id=2726377. Maximum sequence length: 2049, sample length: 4335 [default0]:Skipping sample id=2715914. Maximum sequence length: 2049, sample length: 3697 [default0]:Skipping sample id=2750645. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2727682. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2753527. Maximum sequence length: 2049, sample length: 3335 [default0]:Skipping sample id=2493718. Maximum sequence length: 2049, sample length: 3115 [default0]:Skipping sample id=2746773. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2470030. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2466166. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2742278. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2467269. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2477830. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2470431. Maximum sequence length: 2049, sample length: 2757 [default0]:Skipping sample id=2714897. Maximum sequence length: 2049, sample length: 4155 [default0]:Skipping sample id=2717083. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2711647. Maximum sequence length: 2049, sample length: 2932 [default0]:Skipping sample id=2491068. Maximum sequence length: 2049, sample length: 2756 [default0]:Skipping sample id=2468863. Maximum sequence length: 2049, sample length: 3240 [default0]:Skipping sample id=2753787. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2483961. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2742005. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2477569. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2757032. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2731998. Maximum sequence length: 2049, sample length: 3452 [default0]:Skipping sample id=2752824. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2720569. Maximum sequence length: 2049, sample length: 3752 [default0]:Skipping sample id=2740968. Maximum sequence length: 2049, sample length: 5486 [default0]:Skipping sample id=2754202. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2734066. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2755020. Maximum sequence length: 2049, sample length: 2593 [default0]:Skipping sample id=2712231. Maximum sequence length: 2049, sample length: 5517 [default0]:Skipping sample id=2725542. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2466119. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2726955. Maximum sequence length: 2049, sample length: 3860 [default0]:Skipping sample id=2719381. Maximum sequence length: 2049, sample length: 4537 [default0]:Skipping sample id=2715598. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2724362. Maximum sequence length: 2049, sample length: 4579 [default0]:Skipping sample id=2711353. Maximum sequence length: 2049, sample length: 3868 [default0]:Skipping sample id=2747636. Maximum sequence length: 2049, sample length: 4315 [default0]:Skipping sample id=2482029. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2721706. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2718602. Maximum sequence length: 2049, sample length: 3337 [default0]:Skipping sample id=2467226. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2740573. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2725046. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2723900. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2712352. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2711433. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2711028. Maximum sequence length: 2049, sample length: 3023 [default0]:Skipping sample id=2732401. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2722330. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2497979. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2714634. Maximum sequence length: 2049, sample length: 3101 [default0]:Skipping sample id=2711457. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2496795. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2743483. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2742100. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2751470. Maximum sequence length: 2049, sample length: 2967 [default0]:Skipping sample id=2753970. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2714477. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2711765. Maximum sequence length: 2049, sample length: 2768 [default0]:Skipping sample id=2498177. Maximum sequence length: 2049, sample length: 3672 [default0]:Skipping sample id=2754830. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2730500. Maximum sequence length: 2049, sample length: 2506 [default0]:Skipping sample id=2750111. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2731570. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2715393. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2727392. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2752288. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2732376. Maximum sequence length: 2049, sample length: 5125 [default0]:Skipping sample id=2745031. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2717365. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2753482. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2721122. Maximum sequence length: 2049, sample length: 4288 [default0]:Skipping sample id=2737484. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2743134. Maximum sequence length: 2049, sample length: 4865 [default0]:Skipping sample id=2749992. Maximum sequence length: 2049, sample length: 5653 [default0]:Skipping sample id=2715323. Maximum sequence length: 2049, sample length: 5675 [default0]:Skipping sample id=2750495. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2723824. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2715522. Maximum sequence length: 2049, sample length: 4145 [default0]:Skipping sample id=2490527. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2735258. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2730562. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2722546. Maximum sequence length: 2049, sample length: 3041 [default0]:Skipping sample id=2712007. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2718270. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2744191. Maximum sequence length: 2049, sample length: 2854 [default0]:Skipping sample id=2726379. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2754156. Maximum sequence length: 2049, sample length: 4106 [default0]:Skipping sample id=2729144. Maximum sequence length: 2049, sample length: 5573 [default0]:Skipping sample id=2730201. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2736380. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2755093. Maximum sequence length: 2049, sample length: 3783 [default0]:Skipping sample id=2744506. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2489499. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2731026. Maximum sequence length: 2049, sample length: 2990 [default0]:Skipping sample id=2725219. Maximum sequence length: 2049, sample length: 4839 [default0]:Skipping sample id=2721257. Maximum sequence length: 2049, sample length: 4378 [default0]:Skipping sample id=2712347. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2732506. Maximum sequence length: 2049, sample length: 3179 [default0]:Skipping sample id=2482656. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2713933. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2747796. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2724187. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2494437. Maximum sequence length: 2049, sample length: 2232 [default0]:Skipping sample id=2492709. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2719462. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2496971. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2750153. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2723783. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2719051. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2496881. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2721233. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2720287. Maximum sequence length: 2049, sample length: 5984 [default0]:Skipping sample id=2736313. Maximum sequence length: 2049, sample length: 3706 [default0]:Skipping sample id=2723164. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2724733. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2714840. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2733993. Maximum sequence length: 2049, sample length: 3837 [default0]:Skipping sample id=2717340. Maximum sequence length: 2049, sample length: 5346 [default0]:Skipping sample id=2740690. Maximum sequence length: 2049, sample length: 4367 [default0]:Skipping sample id=2754887. Maximum sequence length: 2049, sample length: 2824 [default0]:Skipping sample id=2484650. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2749440. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2727984. Maximum sequence length: 2049, sample length: 3218 [default0]:Skipping sample id=2756084. Maximum sequence length: 2049, sample length: 3508 [default0]:Skipping sample id=2731395. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2756888. Maximum sequence length: 2049, sample length: 3333 [default0]:Skipping sample id=2737310. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2723556. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2711052. Maximum sequence length: 2049, sample length: 3089 [default0]:Skipping sample id=2723511. Maximum sequence length: 2049, sample length: 3013 [default0]:Skipping sample id=2492246. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2734628. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2714825. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2720726. Maximum sequence length: 2049, sample length: 5257 [default0]:Skipping sample id=2751260. Maximum sequence length: 2049, sample length: 3211 [default0]:Skipping sample id=2749176. Maximum sequence length: 2049, sample length: 3912 [default0]:Skipping sample id=2494329. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2711987. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2726518. Maximum sequence length: 2049, sample length: 4868 [default0]:Skipping sample id=2744768. Maximum sequence length: 2049, sample length: 8471 [default0]:Skipping sample id=2736746. Maximum sequence length: 2049, sample length: 4784 [default0]:Skipping sample id=2744738. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2723752. Maximum sequence length: 2049, sample length: 3401 [default0]:Skipping sample id=2739122. Maximum sequence length: 2049, sample length: 3805 [default0]:Skipping sample id=2719608. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2726712. Maximum sequence length: 2049, sample length: 4428 [default0]:Skipping sample id=2738688. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2725113. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2725885. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2495448. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2725684. Maximum sequence length: 2049, sample length: 4164 [default0]:Skipping sample id=2495644. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2717812. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2715770. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2751547. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2711692. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2717007. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2754526. Maximum sequence length: 2049, sample length: 3261 [default0]:Skipping sample id=2730306. Maximum sequence length: 2049, sample length: 3978 [default0]:Skipping sample id=2754935. Maximum sequence length: 2049, sample length: 2587 [default0]:Skipping sample id=2725698. Maximum sequence length: 2049, sample length: 3759 [default0]:Skipping sample id=2740478. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2752085. Maximum sequence length: 2049, sample length: 3572 [default0]:Skipping sample id=2719837. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2489000. Maximum sequence length: 2049, sample length: 3156 [default0]:Skipping sample id=2726045. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2753544. Maximum sequence length: 2049, sample length: 4276 [default0]:Skipping sample id=2711566. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2720954. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2729290. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2753716. Maximum sequence length: 2049, sample length: 2598 [default0]:Skipping sample id=2740627. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2733337. Maximum sequence length: 2049, sample length: 4004 [default0]:Skipping sample id=2715384. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2753836. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2738557. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2739008. Maximum sequence length: 2049, sample length: 2521 [default0]:Skipping sample id=2711226. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2711007. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2733534. Maximum sequence length: 2049, sample length: 2915 [default0]:Skipping sample id=2741935. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2720170. Maximum sequence length: 2049, sample length: 5231 [default0]:Skipping sample id=2482580. Maximum sequence length: 2049, sample length: 3391 [default0]:Skipping sample id=2745589. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2752578. Maximum sequence length: 2049, sample length: 3370 [default0]:Skipping sample id=2729595. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2478931. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2742940. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2466919. Maximum sequence length: 2049, sample length: 3635 [default0]:Skipping sample id=2726088. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2718786. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2748850. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2740393. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2497437. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2729540. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2494758. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2751199. Maximum sequence length: 2049, sample length: 2925 [default0]:Skipping sample id=2732827. Maximum sequence length: 2049, sample length: 2969 [default0]:Skipping sample id=2748763. Maximum sequence length: 2049, sample length: 2421 [default0]:Skipping sample id=2719063. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2754267. Maximum sequence length: 2049, sample length: 4192 [default0]:Skipping sample id=2750772. Maximum sequence length: 2049, sample length: 3486 [default0]:Skipping sample id=2753714. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2729950. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2753533. Maximum sequence length: 2049, sample length: 2796 [default0]:Skipping sample id=2734620. Maximum sequence length: 2049, sample length: 4562 [default0]:Skipping sample id=2718719. Maximum sequence length: 2049, sample length: 3980 [default0]:Skipping sample id=2722263. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2714273. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2727099. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2738748. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2719218. Maximum sequence length: 2049, sample length: 5817 [default0]:Skipping sample id=2753403. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2753392. Maximum sequence length: 2049, sample length: 4870 [default0]:Skipping sample id=2751172. Maximum sequence length: 2049, sample length: 3744 [default0]:Skipping sample id=2747911. Maximum sequence length: 2049, sample length: 4446 [default0]:Skipping sample id=2714930. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2495797. Maximum sequence length: 2049, sample length: 2803 [default0]:Skipping sample id=2723664. Maximum sequence length: 2049, sample length: 3004 [default0]:Skipping sample id=2743179. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2735088. Maximum sequence length: 2049, sample length: 2597 [default0]:Skipping sample id=2723962. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2750392. Maximum sequence length: 2049, sample length: 7513 [default0]:Skipping sample id=2753892. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2725156. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2755719. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2749226. Maximum sequence length: 2049, sample length: 6167 [default0]:Skipping sample id=2714367. Maximum sequence length: 2049, sample length: 2714 [default0]:Skipping sample id=2724250. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2724978. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2731566. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2745015. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2488733. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2485626. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2715948. Maximum sequence length: 2049, sample length: 2569 [default0]:Skipping sample id=2494245. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2752856. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2713844. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2717923. Maximum sequence length: 2049, sample length: 4542 [default0]:Skipping sample id=2727129. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2716837. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2748807. Maximum sequence length: 2049, sample length: 3958 [default0]:Skipping sample id=2749350. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2482506. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2717913. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2491819. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2741072. Maximum sequence length: 2049, sample length: 4234 [default0]:Skipping sample id=2749129. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2714079. Maximum sequence length: 2049, sample length: 3664 [default0]:Skipping sample id=2720013. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2754365. Maximum sequence length: 2049, sample length: 3838 [default0]:Skipping sample id=2725192. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2746201. Maximum sequence length: 2049, sample length: 4236 [default0]:Skipping sample id=2714451. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2744456. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2713128. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2751570. Maximum sequence length: 2049, sample length: 4123 [default0]:Skipping sample id=2749638. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2733083. Maximum sequence length: 2049, sample length: 3459 [default0]:Skipping sample id=2732319. Maximum sequence length: 2049, sample length: 3171 [default0]:Skipping sample id=2748566. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2715853. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2732022. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2734696. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2732358. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2468439. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2482245. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2482779. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2487808. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2736284. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2711284. Maximum sequence length: 2049, sample length: 5106 [default0]:Skipping sample id=2711293. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2731508. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2729759. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2717867. Maximum sequence length: 2049, sample length: 3073 [default0]:Skipping sample id=2729079. Maximum sequence length: 2049, sample length: 5115 [default0]:Skipping sample id=2721628. Maximum sequence length: 2049, sample length: 3210 [default0]:Skipping sample id=2730744. Maximum sequence length: 2049, sample length: 3430 [default0]:Skipping sample id=2714318. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2723839. Maximum sequence length: 2049, sample length: 4249 [default0]:Skipping sample id=2714468. Maximum sequence length: 2049, sample length: 3651 [default0]:Skipping sample id=2497966. Maximum sequence length: 2049, sample length: 2455 [default0]:Skipping sample id=2713900. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2754023. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2749269. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2469209. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2732605. Maximum sequence length: 2049, sample length: 2594 [default0]:Skipping sample id=2741695. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2744309. Maximum sequence length: 2049, sample length: 4114 [default0]:Skipping sample id=2478859. Maximum sequence length: 2049, sample length: 3302 [default0]:Skipping sample id=2482906. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2711247. Maximum sequence length: 2049, sample length: 4827 [default0]:Skipping sample id=2498143. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2735806. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2745355. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2750960. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2721421. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2471263. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2722171. Maximum sequence length: 2049, sample length: 3032 [default0]:Skipping sample id=2728835. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2723176. Maximum sequence length: 2049, sample length: 4941 [default0]:Skipping sample id=2741531. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2753816. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2733252. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2479555. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2755604. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2736303. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2718427. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2718471. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2712838. Maximum sequence length: 2049, sample length: 3484 [default0]:Skipping sample id=2726119. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2724007. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2738014. Maximum sequence length: 2049, sample length: 3085 [default0]:Skipping sample id=2721529. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2720463. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2736699. Maximum sequence length: 2049, sample length: 3074 [default0]:Skipping sample id=2742720. Maximum sequence length: 2049, sample length: 4689 [default0]:Skipping sample id=2720738. Maximum sequence length: 2049, sample length: 3913 [default0]:Skipping sample id=2721004. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2750367. Maximum sequence length: 2049, sample length: 3184 [default0]:Skipping sample id=2732197. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2741179. Maximum sequence length: 2049, sample length: 3764 [default0]:Skipping sample id=2728364. Maximum sequence length: 2049, sample length: 3269 [default0]:Skipping sample id=2748167. Maximum sequence length: 2049, sample length: 3687 [default0]:Skipping sample id=2727569. Maximum sequence length: 2049, sample length: 4080 [default0]:Skipping sample id=2747233. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2724656. Maximum sequence length: 2049, sample length: 3964 [default0]:Skipping sample id=2727560. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2718462. Maximum sequence length: 2049, sample length: 3981 [default0]:Skipping sample id=2741904. Maximum sequence length: 2049, sample length: 2855 [default0]:Skipping sample id=2726771. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2746116. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2737032. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2732847. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2719309. Maximum sequence length: 2049, sample length: 2406 [default0]:Skipping sample id=2732539. Maximum sequence length: 2049, sample length: 3267 [default0]:Skipping sample id=2753353. Maximum sequence length: 2049, sample length: 6262 [default0]:Skipping sample id=2490666. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2716147. Maximum sequence length: 2049, sample length: 5013 [default0]:Skipping sample id=2712405. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2713143. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2738550. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2715918. Maximum sequence length: 2049, sample length: 2486 [default0]:Skipping sample id=2731321. Maximum sequence length: 2049, sample length: 5350 [default0]:Skipping sample id=2485562. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2496141. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2748989. Maximum sequence length: 2049, sample length: 3539 [default0]:Skipping sample id=2740788. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2756505. Maximum sequence length: 2049, sample length: 2470 [default0]:Skipping sample id=2489406. Maximum sequence length: 2049, sample length: 3627 [default0]:Skipping sample id=2730317. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2737073. Maximum sequence length: 2049, sample length: 3092 [default0]:Skipping sample id=2735651. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2748328. Maximum sequence length: 2049, sample length: 3027 [default0]:Skipping sample id=2736985. Maximum sequence length: 2049, sample length: 6488 [default0]:Skipping sample id=2726338. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2754218. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2744120. Maximum sequence length: 2049, sample length: 4220 [default0]:Skipping sample id=2482743. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2722355. Maximum sequence length: 2049, sample length: 4075 [default0]:Skipping sample id=2711954. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2724033. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2711624. Maximum sequence length: 2049, sample length: 3680 [default0]:Skipping sample id=2753106. Maximum sequence length: 2049, sample length: 3615 [default0]:Skipping sample id=2726203. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2731872. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2741689. Maximum sequence length: 2049, sample length: 3337 [default0]:Skipping sample id=2726501. Maximum sequence length: 2049, sample length: 3445 [default0]:Skipping sample id=2753201. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2756416. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2715417. Maximum sequence length: 2049, sample length: 4806 [default0]:Skipping sample id=2728229. Maximum sequence length: 2049, sample length: 4900 [default0]:Skipping sample id=2731021. Maximum sequence length: 2049, sample length: 4223 [default0]:Skipping sample id=2726101. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2755811. Maximum sequence length: 2049, sample length: 4540 [default0]:Skipping sample id=2727388. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2734360. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2495691. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2721024. Maximum sequence length: 2049, sample length: 3192 [default0]:Skipping sample id=2737506. Maximum sequence length: 2049, sample length: 3024 [default0]:Skipping sample id=2715403. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2714757. Maximum sequence length: 2049, sample length: 3829 [default0]:Skipping sample id=2713373. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2730927. Maximum sequence length: 2049, sample length: 4062 [default0]:Skipping sample id=2490848. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2729994. Maximum sequence length: 2049, sample length: 4975 [default0]:Skipping sample id=2732730. Maximum sequence length: 2049, sample length: 3107 [default0]:Skipping sample id=2721796. Maximum sequence length: 2049, sample length: 3650 [default0]:Skipping sample id=2752011. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2727674. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2719901. Maximum sequence length: 2049, sample length: 2557 [default0]:Skipping sample id=2751243. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2739191. Maximum sequence length: 2049, sample length: 2841 [default0]:Skipping sample id=2751805. Maximum sequence length: 2049, sample length: 3340 [default0]:Skipping sample id=2749338. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2488102. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2726147. Maximum sequence length: 2049, sample length: 3884 [default0]:Skipping sample id=2468753. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2754900. Maximum sequence length: 2049, sample length: 2946 [default0]:Skipping sample id=2730467. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2722910. Maximum sequence length: 2049, sample length: 4809 [default0]:Skipping sample id=2724679. Maximum sequence length: 2049, sample length: 3596 [default0]:Skipping sample id=2734040. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2733956. Maximum sequence length: 2049, sample length: 2766 [default0]:Skipping sample id=2753887. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2754444. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2743010. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2732327. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2747901. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2725792. Maximum sequence length: 2049, sample length: 3934 [default0]:Skipping sample id=2733320. Maximum sequence length: 2049, sample length: 3208 [default0]:Skipping sample id=2721018. Maximum sequence length: 2049, sample length: 3877 [default0]:Skipping sample id=2746657. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2757012. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2721336. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2468632. Maximum sequence length: 2049, sample length: 3532 [default0]:Skipping sample id=2716635. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2744698. Maximum sequence length: 2049, sample length: 3288 [default0]:Skipping sample id=2713524. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2732243. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2728970. Maximum sequence length: 2049, sample length: 3100 [default0]:Skipping sample id=2711733. Maximum sequence length: 2049, sample length: 4510 [default0]:Skipping sample id=2717797. Maximum sequence length: 2049, sample length: 3715 [default0]:Skipping sample id=2752247. Maximum sequence length: 2049, sample length: 7153 [default0]:Skipping sample id=2749895. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2734665. Maximum sequence length: 2049, sample length: 3935 [default0]:Skipping sample id=2742960. Maximum sequence length: 2049, sample length: 7070 [default0]:Skipping sample id=2727962. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2753656. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2728123. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2744478. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2742993. Maximum sequence length: 2049, sample length: 2902 [default0]:Skipping sample id=2729222. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2746931. Maximum sequence length: 2049, sample length: 2896 [default0]:Skipping sample id=2713728. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2491196. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2719248. Maximum sequence length: 2049, sample length: 8234 [default0]:Skipping sample id=2747584. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2753746. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2730504. Maximum sequence length: 2049, sample length: 2899 [default0]:Skipping sample id=2724950. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2746056. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2742408. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2741037. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2721093. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2738328. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2755211. Maximum sequence length: 2049, sample length: 3361 [default0]:Skipping sample id=2494228. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2749405. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2742399. Maximum sequence length: 2049, sample length: 5841 [default0]:Skipping sample id=2744671. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2727888. Maximum sequence length: 2049, sample length: 2894 [default0]:Skipping sample id=2727248. Maximum sequence length: 2049, sample length: 3150 [default0]:Skipping sample id=2711290. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2482343. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2744077. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2729761. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2726673. Maximum sequence length: 2049, sample length: 4504 [default0]:Skipping sample id=2489160. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2723758. Maximum sequence length: 2049, sample length: 4252 [default0]:Skipping sample id=2741707. Maximum sequence length: 2049, sample length: 3076 [default0]:Skipping sample id=2744666. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2735444. Maximum sequence length: 2049, sample length: 2846 [default0]:Skipping sample id=2745173. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2714796. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2727202. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2477580. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2711863. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2735618. Maximum sequence length: 2049, sample length: 4294 [default0]:Skipping sample id=2723576. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2754812. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2739402. Maximum sequence length: 2049, sample length: 5442 [default0]:Skipping sample id=2495305. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2728573. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2713992. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2725646. Maximum sequence length: 2049, sample length: 6438 [default0]:Skipping sample id=2714450. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2741330. Maximum sequence length: 2049, sample length: 6153 [default0]:Skipping sample id=2738877. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2729926. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2726462. Maximum sequence length: 2049, sample length: 4772 [default0]:Skipping sample id=2495560. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2716636. Maximum sequence length: 2049, sample length: 4691 [default0]:Skipping sample id=2726343. Maximum sequence length: 2049, sample length: 3208 [default0]:Skipping sample id=2716711. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2729645. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2720871. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2735400. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2753218. Maximum sequence length: 2049, sample length: 7077 [default0]:Skipping sample id=2712937. Maximum sequence length: 2049, sample length: 3209 [default0]:Skipping sample id=2730489. Maximum sequence length: 2049, sample length: 4294 [default0]:Skipping sample id=2718861. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2720898. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2745143. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2737137. Maximum sequence length: 2049, sample length: 3958 [default0]:Skipping sample id=2717554. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2714652. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2745192. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2737288. Maximum sequence length: 2049, sample length: 3967 [default0]:Skipping sample id=2470145. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2747345. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2492264. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2719080. Maximum sequence length: 2049, sample length: 4589 [default0]:Skipping sample id=2714353. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2490620. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2486336. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2721114. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2725522. Maximum sequence length: 2049, sample length: 3152 [default0]:Skipping sample id=2494879. Maximum sequence length: 2049, sample length: 3263 [default0]:Skipping sample id=2725639. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2722633. Maximum sequence length: 2049, sample length: 4003 [default0]:Skipping sample id=2717226. Maximum sequence length: 2049, sample length: 4068 [default0]:Skipping sample id=2715065. Maximum sequence length: 2049, sample length: 3969 [default0]:Skipping sample id=2734877. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2747633. Maximum sequence length: 2049, sample length: 5087 [default0]:Skipping sample id=2747385. Maximum sequence length: 2049, sample length: 3304 [default0]:Skipping sample id=2737504. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2731332. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2741512. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2468507. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2752268. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2718158. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2726750. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2735363. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2720486. Maximum sequence length: 2049, sample length: 3440 [default0]:Skipping sample id=2723447. Maximum sequence length: 2049, sample length: 6498 [default0]:Skipping sample id=2466229. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2722544. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2736227. Maximum sequence length: 2049, sample length: 4201 [default0]:Skipping sample id=2747202. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2719147. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2714174. Maximum sequence length: 2049, sample length: 3515 [default0]:Skipping sample id=2751728. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2734527. Maximum sequence length: 2049, sample length: 4360 [default0]:Skipping sample id=2735390. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2747920. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2749583. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2750943. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2722290. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2711764. Maximum sequence length: 2049, sample length: 2678 [default0]:Skipping sample id=2723929. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2723135. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2724086. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2748731. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2716546. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2744853. Maximum sequence length: 2049, sample length: 3408 [default0]:Skipping sample id=2726227. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2725098. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2712192. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2724904. Maximum sequence length: 2049, sample length: 4029 [default0]:Skipping sample id=2742099. Maximum sequence length: 2049, sample length: 4163 [default0]:Skipping sample id=2750371. Maximum sequence length: 2049, sample length: 3570 [default0]:Skipping sample id=2752861. Maximum sequence length: 2049, sample length: 2942 [default0]:Skipping sample id=2465832. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2718215. Maximum sequence length: 2049, sample length: 3681 [default0]:Skipping sample id=2738544. Maximum sequence length: 2049, sample length: 4198 [default0]:Skipping sample id=2752255. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2732523. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2719649. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2746169. Maximum sequence length: 2049, sample length: 5338 [default0]:Skipping sample id=2724657. Maximum sequence length: 2049, sample length: 2709 [default0]:Skipping sample id=2747415. Maximum sequence length: 2049, sample length: 3037 [default0]:Skipping sample id=2746375. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2495402. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2723841. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2742997. Maximum sequence length: 2049, sample length: 5262 [default0]:Skipping sample id=2736238. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2743105. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2719736. Maximum sequence length: 2049, sample length: 4985 [default0]:Skipping sample id=2720481. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2731792. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2752148. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2734464. Maximum sequence length: 2049, sample length: 3858 [default0]:Skipping sample id=2715784. Maximum sequence length: 2049, sample length: 4496 [default0]:Skipping sample id=2750684. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2740997. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2732841. Maximum sequence length: 2049, sample length: 3791 [default0]:Skipping sample id=2470055. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2742087. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2746108. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2486999. Maximum sequence length: 2049, sample length: 4285 [default0]:Skipping sample id=2714127. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2749604. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2744097. Maximum sequence length: 2049, sample length: 6158 [default0]:Skipping sample id=2753592. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2746114. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2736422. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2717413. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2750423. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2742934. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2752448. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2726327. Maximum sequence length: 2049, sample length: 5064 [default0]:Skipping sample id=2729328. Maximum sequence length: 2049, sample length: 3716 [default0]:Skipping sample id=2711428. Maximum sequence length: 2049, sample length: 3289 [default0]:Skipping sample id=2747528. Maximum sequence length: 2049, sample length: 5848 [default0]:Skipping sample id=2498932. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2752105. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2749557. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2734896. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2485738. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2716648. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2723889. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2735928. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2493679. Maximum sequence length: 2049, sample length: 2902 [default0]:Skipping sample id=2712513. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2720625. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2717526. Maximum sequence length: 2049, sample length: 2490 [default0]:Skipping sample id=2752544. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2754868. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2732267. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2752358. Maximum sequence length: 2049, sample length: 2521 [default0]:Skipping sample id=2735177. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2735498. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2732857. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2751237. Maximum sequence length: 2049, sample length: 2567 [default0]:Skipping sample id=2750963. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2741639. Maximum sequence length: 2049, sample length: 3378 [default0]:Skipping sample id=2744367. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2482798. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2469759. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2725151. Maximum sequence length: 2049, sample length: 6633 [default0]:Skipping sample id=2735777. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2753451. Maximum sequence length: 2049, sample length: 3564 [default0]:Skipping sample id=2737646. Maximum sequence length: 2049, sample length: 2624 [default0]:Skipping sample id=2732738. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2739648. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2713137. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2714430. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2732990. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2726060. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2741988. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2729025. Maximum sequence length: 2049, sample length: 2963 [default0]:Skipping sample id=2715152. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2729337. Maximum sequence length: 2049, sample length: 2671 [default0]:Skipping sample id=2728753. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2717486. Maximum sequence length: 2049, sample length: 3236 [default0]:Skipping sample id=2720609. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2749621. Maximum sequence length: 2049, sample length: 4502 [default0]:Skipping sample id=2755206. Maximum sequence length: 2049, sample length: 3433 [default0]:Skipping sample id=2739333. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2717372. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2717696. Maximum sequence length: 2049, sample length: 3469 [default0]:Skipping sample id=2746962. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2718880. Maximum sequence length: 2049, sample length: 5688 [default0]:Skipping sample id=2733411. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2721063. Maximum sequence length: 2049, sample length: 5360 [default0]:Skipping sample id=2729375. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2725920. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2756442. Maximum sequence length: 2049, sample length: 2784 [default0]:Skipping sample id=2736999. Maximum sequence length: 2049, sample length: 3653 [default0]:Skipping sample id=2728178. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2752100. Maximum sequence length: 2049, sample length: 3803 [default0]:Skipping sample id=2730073. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2754657. Maximum sequence length: 2049, sample length: 3724 [default0]:Skipping sample id=2491246. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2737979. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2712782. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2711840. Maximum sequence length: 2049, sample length: 3006 [default0]:Skipping sample id=2716627. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2722244. Maximum sequence length: 2049, sample length: 5188 [default0]:Skipping sample id=2725503. Maximum sequence length: 2049, sample length: 3816 [default0]:Skipping sample id=2747379. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2483611. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2752246. Maximum sequence length: 2049, sample length: 2900 [default0]:Skipping sample id=2755431. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2749373. Maximum sequence length: 2049, sample length: 6060 [default0]:Skipping sample id=2727968. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2712899. Maximum sequence length: 2049, sample length: 4326 [default0]:Skipping sample id=2468414. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2494012. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2748707. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2483756. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2728515. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2734340. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2741710. Maximum sequence length: 2049, sample length: 5866 [default0]:Skipping sample id=2721901. Maximum sequence length: 2049, sample length: 4175 [default0]:Skipping sample id=2717513. Maximum sequence length: 2049, sample length: 3765 [default0]:Skipping sample id=2496548. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2469306. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2715426. Maximum sequence length: 2049, sample length: 4157 [default0]:Skipping sample id=2718513. Maximum sequence length: 2049, sample length: 4473 [default0]:Skipping sample id=2728884. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2737586. Maximum sequence length: 2049, sample length: 5133 [default0]:Skipping sample id=2718079. Maximum sequence length: 2049, sample length: 5439 [default0]:Skipping sample id=2721324. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2713477. Maximum sequence length: 2049, sample length: 3847 [default0]:Skipping sample id=2718923. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2734812. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2717809. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2730954. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2718506. Maximum sequence length: 2049, sample length: 4000 [default0]:Skipping sample id=2734021. Maximum sequence length: 2049, sample length: 3260 [default0]:Skipping sample id=2751789. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2735303. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2715173. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2714692. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2748241. Maximum sequence length: 2049, sample length: 5494 [default0]:Skipping sample id=2713235. Maximum sequence length: 2049, sample length: 4441 [default0]:Skipping sample id=2740025. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2730779. Maximum sequence length: 2049, sample length: 3457 [default0]:Skipping sample id=2752467. Maximum sequence length: 2049, sample length: 3680 [default0]:Skipping sample id=2713320. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2742989. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2727355. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2752919. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2712900. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2732691. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2754699. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2723356. Maximum sequence length: 2049, sample length: 3061 [default0]:Skipping sample id=2716165. Maximum sequence length: 2049, sample length: 2973 [default0]:Skipping sample id=2740657. Maximum sequence length: 2049, sample length: 2490 [default0]:Skipping sample id=2734968. Maximum sequence length: 2049, sample length: 6628 [default0]:Skipping sample id=2735910. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2716631. Maximum sequence length: 2049, sample length: 2672 [default0]:Skipping sample id=2742280. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2730023. Maximum sequence length: 2049, sample length: 3801 [default0]:Skipping sample id=2713225. Maximum sequence length: 2049, sample length: 4958 [default0]:Skipping sample id=2721050. Maximum sequence length: 2049, sample length: 2919 [default0]:Skipping sample id=2712536. Maximum sequence length: 2049, sample length: 2695 [default0]:Skipping sample id=2744575. Maximum sequence length: 2049, sample length: 3316 [default0]:Skipping sample id=2478230. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2749320. Maximum sequence length: 2049, sample length: 2861 [default0]:Skipping sample id=2747690. Maximum sequence length: 2049, sample length: 6767 [default0]:Skipping sample id=2468790. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2723990. Maximum sequence length: 2049, sample length: 2468 [default0]:Skipping sample id=2714893. Maximum sequence length: 2049, sample length: 6626 [default0]:Skipping sample id=2716174. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2754531. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2745651. Maximum sequence length: 2049, sample length: 2597 [default0]:Skipping sample id=2751583. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2755938. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2718864. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2737674. Maximum sequence length: 2049, sample length: 3326 [default0]:Skipping sample id=2721770. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2716248. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2752836. Maximum sequence length: 2049, sample length: 4040 [default0]:Skipping sample id=2725554. Maximum sequence length: 2049, sample length: 2983 [default0]:Skipping sample id=2740384. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2724466. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2477472. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2711000. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2732020. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2736912. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2755163. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2718116. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2754867. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2751253. Maximum sequence length: 2049, sample length: 4369 [default0]:Skipping sample id=2727448. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2747564. Maximum sequence length: 2049, sample length: 3226 [default0]:Skipping sample id=2736508. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2752162. Maximum sequence length: 2049, sample length: 2860 [default0]:Skipping sample id=2716605. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2739624. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2746802. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2479422. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2714201. Maximum sequence length: 2049, sample length: 4703 [default0]:Skipping sample id=2714099. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2724094. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2738410. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2717598. Maximum sequence length: 2049, sample length: 4772 [default0]:Skipping sample id=2753765. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2741167. Maximum sequence length: 2049, sample length: 3943 [default0]:Skipping sample id=2739794. Maximum sequence length: 2049, sample length: 3700 [default0]:Skipping sample id=2733322. Maximum sequence length: 2049, sample length: 4112 [default0]:Skipping sample id=2737774. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2725403. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2484115. Maximum sequence length: 2049, sample length: 3275 [default0]:Skipping sample id=2726790. Maximum sequence length: 2049, sample length: 4420 [default0]:Skipping sample id=2738108. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2726252. Maximum sequence length: 2049, sample length: 3584 [default0]:Skipping sample id=2713175. Maximum sequence length: 2049, sample length: 5675 [default0]:Skipping sample id=2722189. Maximum sequence length: 2049, sample length: 3764 [default0]:Skipping sample id=2478438. Maximum sequence length: 2049, sample length: 2602 [default0]:Skipping sample id=2494608. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2731622. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2727812. Maximum sequence length: 2049, sample length: 3159 [default0]:Skipping sample id=2716084. Maximum sequence length: 2049, sample length: 2949 [default0]:Skipping sample id=2731203. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2730792. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2740991. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2471062. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2753062. Maximum sequence length: 2049, sample length: 14228 [default0]:Skipping sample id=2471043. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2733921. Maximum sequence length: 2049, sample length: 3561 [default0]:Skipping sample id=2725814. Maximum sequence length: 2049, sample length: 3446 [default0]:Skipping sample id=2728876. Maximum sequence length: 2049, sample length: 3186 [default0]:Skipping sample id=2729867. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2745172. Maximum sequence length: 2049, sample length: 3813 [default0]:Skipping sample id=2486685. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2715334. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2747341. Maximum sequence length: 2049, sample length: 3356 [default0]:Skipping sample id=2490277. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2743990. Maximum sequence length: 2049, sample length: 3366 [default0]:Skipping sample id=2718315. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2726346. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2732822. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2751694. Maximum sequence length: 2049, sample length: 3132 [default0]:Skipping sample id=2752647. Maximum sequence length: 2049, sample length: 3829 [default0]:Skipping sample id=2731741. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2741826. Maximum sequence length: 2049, sample length: 2994 [default0]:Skipping sample id=2730697. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2753156. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2746074. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2724140. Maximum sequence length: 2049, sample length: 4677 [default0]:Skipping sample id=2729269. Maximum sequence length: 2049, sample length: 4660 [default0]:Skipping sample id=2724916. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2733354. Maximum sequence length: 2049, sample length: 3474 [default0]:Skipping sample id=2756953. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2731827. Maximum sequence length: 2049, sample length: 2798 [default0]:Skipping sample id=2725437. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2746386. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2477571. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2750347. Maximum sequence length: 2049, sample length: 2950 [default0]:Skipping sample id=2718291. Maximum sequence length: 2049, sample length: 2676 [default0]:Skipping sample id=2727394. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2749565. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2733220. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2728632. Maximum sequence length: 2049, sample length: 3524 [default0]:Skipping sample id=2754815. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2744672. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2745224. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2496678. Maximum sequence length: 2049, sample length: 3417 [default0]:Skipping sample id=2731511. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2745572. Maximum sequence length: 2049, sample length: 5058 [default0]:Skipping sample id=2741579. Maximum sequence length: 2049, sample length: 5234 [default0]:Skipping sample id=2720479. Maximum sequence length: 2049, sample length: 4532 [default0]:Skipping sample id=2724056. Maximum sequence length: 2049, sample length: 3231 [default0]:Skipping sample id=2714617. Maximum sequence length: 2049, sample length: 6614 [default0]:Skipping sample id=2731268. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2471100. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2715828. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2716108. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2466517. Maximum sequence length: 2049, sample length: 2876 [default0]:Skipping sample id=2714180. Maximum sequence length: 2049, sample length: 2406 [default0]:Skipping sample id=2470007. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2721022. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2752707. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2746815. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2731630. Maximum sequence length: 2049, sample length: 3445 [default0]:Skipping sample id=2727139. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2468431. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2731647. Maximum sequence length: 2049, sample length: 4868 [default0]:Skipping sample id=2727712. Maximum sequence length: 2049, sample length: 3791 [default0]:Skipping sample id=2743091. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2754176. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2715026. Maximum sequence length: 2049, sample length: 3588 [default0]:Skipping sample id=2742276. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2729913. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2734248. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2748330. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2738072. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2744011. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2496904. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2715083. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2483008. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2739674. Maximum sequence length: 2049, sample length: 3090 [default0]:Skipping sample id=2490281. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2749216. Maximum sequence length: 2049, sample length: 3497 [default0]:Skipping sample id=2479507. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2734488. Maximum sequence length: 2049, sample length: 3772 [default0]:Skipping sample id=2744707. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2743280. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2750196. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2721331. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2723301. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2716037. Maximum sequence length: 2049, sample length: 3302 [default0]:Skipping sample id=2747068. Maximum sequence length: 2049, sample length: 5711 [default0]:Skipping sample id=2722117. Maximum sequence length: 2049, sample length: 3004 [default0]:Skipping sample id=2734605. Maximum sequence length: 2049, sample length: 5807 [default0]:Skipping sample id=2732664. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2712789. Maximum sequence length: 2049, sample length: 4085 [default0]:Skipping sample id=2733879. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2729506. Maximum sequence length: 2049, sample length: 8224 [default0]:Skipping sample id=2751104. Maximum sequence length: 2049, sample length: 3918 [default0]:Skipping sample id=2746089. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2745068. Maximum sequence length: 2049, sample length: 4240 [default0]:Skipping sample id=2712690. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2721981. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2719600. Maximum sequence length: 2049, sample length: 3613 [default0]:Skipping sample id=2488279. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2711992. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2728542. Maximum sequence length: 2049, sample length: 7200 [default0]:Skipping sample id=2746628. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2750066. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2720996. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2750868. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2745953. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2721047. Maximum sequence length: 2049, sample length: 2486 [default0]:Skipping sample id=2722020. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2738193. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2741538. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2498934. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2499181. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2733653. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2712397. Maximum sequence length: 2049, sample length: 2490 [default0]:Skipping sample id=2746678. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2739495. Maximum sequence length: 2049, sample length: 2956 [default0]:Skipping sample id=2753910. Maximum sequence length: 2049, sample length: 3029 [default0]:Skipping sample id=2741633. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2740764. Maximum sequence length: 2049, sample length: 5429 [default0]:Skipping sample id=2732566. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2740797. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2739776. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2713701. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2733851. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2715009. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2734588. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2711945. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2721043. Maximum sequence length: 2049, sample length: 3697 [default0]:Skipping sample id=2488824. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2736076. Maximum sequence length: 2049, sample length: 2656 [default0]:Skipping sample id=2742933. Maximum sequence length: 2049, sample length: 2421 [default0]:Skipping sample id=2720920. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2718822. Maximum sequence length: 2049, sample length: 4981 [default0]:Skipping sample id=2478810. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2743361. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2728750. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2711739. Maximum sequence length: 2049, sample length: 3430 [default0]:Skipping sample id=2727762. Maximum sequence length: 2049, sample length: 3986 [default0]:Skipping sample id=2716460. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2726596. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2734260. Maximum sequence length: 2049, sample length: 3315 [default0]:Skipping sample id=2726457. Maximum sequence length: 2049, sample length: 2597 [default0]:Skipping sample id=2738071. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2740906. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2490711. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2750891. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2737021. Maximum sequence length: 2049, sample length: 2678 [default0]:Skipping sample id=2749570. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2750520. Maximum sequence length: 2049, sample length: 6487 [default0]:Skipping sample id=2744289. Maximum sequence length: 2049, sample length: 5103 [default0]:Skipping sample id=2711072. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2751330. Maximum sequence length: 2049, sample length: 4244 [default0]:Skipping sample id=2744430. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2491885. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2727477. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2739408. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2743823. Maximum sequence length: 2049, sample length: 3866 [default0]:Skipping sample id=2746768. Maximum sequence length: 2049, sample length: 3380 [default0]:Skipping sample id=2492866. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2743907. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2736864. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2715947. Maximum sequence length: 2049, sample length: 4313 [default0]:Skipping sample id=2478768. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2715778. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2736553. Maximum sequence length: 2049, sample length: 3292 [default0]:Skipping sample id=2716277. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2722913. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2729251. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2727837. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2497296. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2722424. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2745017. Maximum sequence length: 2049, sample length: 4951 [default0]:Skipping sample id=2751735. Maximum sequence length: 2049, sample length: 2948 [default0]:Skipping sample id=2740210. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2751978. Maximum sequence length: 2049, sample length: 3070 [default0]:Skipping sample id=2729449. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2489257. Maximum sequence length: 2049, sample length: 3258 [default0]:Skipping sample id=2723539. Maximum sequence length: 2049, sample length: 3920 [default0]:Skipping sample id=2734496. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2746963. Maximum sequence length: 2049, sample length: 4650 [default0]:Skipping sample id=2746734. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2725327. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2757026. Maximum sequence length: 2049, sample length: 3385 [default0]:Skipping sample id=2756421. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2726094. Maximum sequence length: 2049, sample length: 3663 [default0]:Skipping sample id=2490022. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2754783. Maximum sequence length: 2049, sample length: 5558 [default0]:Skipping sample id=2748284. Maximum sequence length: 2049, sample length: 5181 [default0]:Skipping sample id=2756885. Maximum sequence length: 2049, sample length: 3833 [default0]:Skipping sample id=2739085. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2752363. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2720931. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2743453. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2724719. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2718007. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2731464. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2753407. Maximum sequence length: 2049, sample length: 3798 [default0]:Skipping sample id=2748463. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2713720. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2720516. Maximum sequence length: 2049, sample length: 3448 [default0]:Skipping sample id=2735121. Maximum sequence length: 2049, sample length: 3074 [default0]:Skipping sample id=2752265. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2724217. Maximum sequence length: 2049, sample length: 3228 [default0]:Skipping sample id=2716776. Maximum sequence length: 2049, sample length: 2932 [default0]:Skipping sample id=2744558. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2479875. Maximum sequence length: 2049, sample length: 3528 [default0]:Skipping sample id=2482192. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2746542. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2726766. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2738699. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2494726. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2731266. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2720819. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2712173. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2711970. Maximum sequence length: 2049, sample length: 3999 [default0]:Skipping sample id=2487739. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2488982. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2741324. Maximum sequence length: 2049, sample length: 2980 [default0]:Skipping sample id=2754624. Maximum sequence length: 2049, sample length: 3149 [default0]:Skipping sample id=2720761. Maximum sequence length: 2049, sample length: 2993 [default0]:Skipping sample id=2725193. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2736916. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2725901. Maximum sequence length: 2049, sample length: 3433 [default0]:Skipping sample id=2714701. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2749184. Maximum sequence length: 2049, sample length: 3746 [default0]:Skipping sample id=2484965. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2722353. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2746053. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2732502. Maximum sequence length: 2049, sample length: 5171 [default0]:Skipping sample id=2719267. Maximum sequence length: 2049, sample length: 2676 [default0]:Skipping sample id=2716111. Maximum sequence length: 2049, sample length: 3403 [default0]:Skipping sample id=2742127. Maximum sequence length: 2049, sample length: 3793 [default0]:Skipping sample id=2725341. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2499022. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2481197. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2734310. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2719308. Maximum sequence length: 2049, sample length: 3421 [default0]:Skipping sample id=2716374. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2722407. Maximum sequence length: 2049, sample length: 5298 [default0]:Skipping sample id=2748642. Maximum sequence length: 2049, sample length: 6011 [default0]:Skipping sample id=2722497. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2740549. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2724263. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2723089. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2735680. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2717848. Maximum sequence length: 2049, sample length: 2748 [default0]:Skipping sample id=2477337. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2466283. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2733931. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2727030. Maximum sequence length: 2049, sample length: 4158 [default0]:Skipping sample id=2718584. Maximum sequence length: 2049, sample length: 5717 [default0]:Skipping sample id=2749813. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2742932. Maximum sequence length: 2049, sample length: 3696 [default0]:Skipping sample id=2717288. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2713722. Maximum sequence length: 2049, sample length: 4701 [default0]:Skipping sample id=2711502. Maximum sequence length: 2049, sample length: 2672 [default0]:Skipping sample id=2479776. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2717716. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2755964. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2755722. Maximum sequence length: 2049, sample length: 3738 [default0]:Skipping sample id=2716235. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2740680. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2752521. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2471200. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2479225. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2741725. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2721015. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2720006. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2723047. Maximum sequence length: 2049, sample length: 4092 [default0]:Skipping sample id=2494917. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2714646. Maximum sequence length: 2049, sample length: 4022 [default0]:Skipping sample id=2748232. Maximum sequence length: 2049, sample length: 2457 [default0]:Skipping sample id=2467388. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2740503. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2745297. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2728282. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2718292. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2735180. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2732685. Maximum sequence length: 2049, sample length: 3458 [default0]:Skipping sample id=2724852. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2740134. Maximum sequence length: 2049, sample length: 7506 [default0]:Skipping sample id=2478785. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2485719. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2713552. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2738760. Maximum sequence length: 2049, sample length: 3507 [default0]:Skipping sample id=2721412. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2726015. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2738840. Maximum sequence length: 2049, sample length: 4502 [default0]:Skipping sample id=2728760. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2713515. Maximum sequence length: 2049, sample length: 5258 [default0]:Skipping sample id=2717517. Maximum sequence length: 2049, sample length: 4132 [default0]:Skipping sample id=2489966. Maximum sequence length: 2049, sample length: 2234 [default0]:Skipping sample id=2716065. Maximum sequence length: 2049, sample length: 5988 [default0]:Skipping sample id=2748228. Maximum sequence length: 2049, sample length: 3676 [default0]:Skipping sample id=2734502. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2483507. Maximum sequence length: 2049, sample length: 2778 [default0]:Skipping sample id=2743690. Maximum sequence length: 2049, sample length: 3084 [default0]:Skipping sample id=2725663. Maximum sequence length: 2049, sample length: 2876 [default0]:Skipping sample id=2480116. Maximum sequence length: 2049, sample length: 3331 [default0]:Skipping sample id=2752308. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2728408. Maximum sequence length: 2049, sample length: 6399 [default0]:Skipping sample id=2747372. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2730001. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2719969. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2741802. Maximum sequence length: 2049, sample length: 2888 [default0]:Skipping sample id=2745253. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2724457. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2728622. Maximum sequence length: 2049, sample length: 4431 [default0]:Skipping sample id=2715654. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2756184. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2481810. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2738778. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2713314. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2724688. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2730816. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2721149. Maximum sequence length: 2049, sample length: 4382 [default0]:Skipping sample id=2745404. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2498635. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2715608. Maximum sequence length: 2049, sample length: 3560 [default0]:Skipping sample id=2755403. Maximum sequence length: 2049, sample length: 2837 [default0]:Skipping sample id=2740345. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2745148. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2725223. Maximum sequence length: 2049, sample length: 2945 [default0]:Skipping sample id=2740048. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2712682. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2736478. Maximum sequence length: 2049, sample length: 3239 [default0]:Skipping sample id=2740714. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2711119. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2731133. Maximum sequence length: 2049, sample length: 3218 [default0]:Skipping sample id=2478134. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2739428. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2733982. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2735900. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2731574. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2739583. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2497199. Maximum sequence length: 2049, sample length: 3320 [default0]:Skipping sample id=2750052. Maximum sequence length: 2049, sample length: 3709 [default0]:Skipping sample id=2738080. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2721357. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2726718. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2753832. Maximum sequence length: 2049, sample length: 3090 [default0]:Skipping sample id=2738582. Maximum sequence length: 2049, sample length: 3142 [default0]:Skipping sample id=2754203. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2743635. Maximum sequence length: 2049, sample length: 3018 [default0]:Skipping sample id=2749697. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2740125. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2733592. Maximum sequence length: 2049, sample length: 4858 [default0]:Skipping sample id=2498889. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2718099. Maximum sequence length: 2049, sample length: 4938 [default0]:Skipping sample id=2712743. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2752227. Maximum sequence length: 2049, sample length: 4352 [default0]:Skipping sample id=2492943. Maximum sequence length: 2049, sample length: 2756 [default0]:Skipping sample id=2742603. Maximum sequence length: 2049, sample length: 2939 [default0]:Skipping sample id=2755997. Maximum sequence length: 2049, sample length: 2844 [default0]:Skipping sample id=2485270. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2756689. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2721189. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2737118. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2723452. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2732094. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2744780. Maximum sequence length: 2049, sample length: 4003 [default0]:Skipping sample id=2716932. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2469098. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2736197. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2719169. Maximum sequence length: 2049, sample length: 3715 [default0]:Skipping sample id=2739846. Maximum sequence length: 2049, sample length: 4599 [default0]:Skipping sample id=2750253. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2717778. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2726613. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2487506. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2488716. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2744845. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2727134. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2742349. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2730348. Maximum sequence length: 2049, sample length: 4822 [default0]:Skipping sample id=2751802. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2736183. Maximum sequence length: 2049, sample length: 4218 [default0]:Skipping sample id=2752847. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2729195. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2492053. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2729486. Maximum sequence length: 2049, sample length: 3020 [default0]:Skipping sample id=2713980. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2713533. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2743822. Maximum sequence length: 2049, sample length: 3027 [default0]:Skipping sample id=2477376. Maximum sequence length: 2049, sample length: 3162 [default0]:Skipping sample id=2478512. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2735575. Maximum sequence length: 2049, sample length: 5077 [default0]:Skipping sample id=2718448. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2750488. Maximum sequence length: 2049, sample length: 2974 [default0]:Skipping sample id=2718851. Maximum sequence length: 2049, sample length: 4701 [default0]:Skipping sample id=2730566. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2743363. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2482802. Maximum sequence length: 2049, sample length: 3601 [default0]:Skipping sample id=2754777. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2747339. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2747585. Maximum sequence length: 2049, sample length: 3471 [default0]:Skipping sample id=2736810. Maximum sequence length: 2049, sample length: 5703 [default0]:Skipping sample id=2745823. Maximum sequence length: 2049, sample length: 3655 [default0]:Skipping sample id=2470376. Maximum sequence length: 2049, sample length: 2400 [default0]:Skipping sample id=2750179. Maximum sequence length: 2049, sample length: 3081 [default0]:Skipping sample id=2753626. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2490639. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2753853. Maximum sequence length: 2049, sample length: 3898 [default0]:Skipping sample id=2729170. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2712068. Maximum sequence length: 2049, sample length: 4048 [default0]:Skipping sample id=2736084. Maximum sequence length: 2049, sample length: 4164 [default0]:Skipping sample id=2711127. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2726835. Maximum sequence length: 2049, sample length: 2506 [default0]:Skipping sample id=2727571. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2735897. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2485437. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2715832. Maximum sequence length: 2049, sample length: 3019 [default0]:Skipping sample id=2470059. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2737493. Maximum sequence length: 2049, sample length: 4221 [default0]:Skipping sample id=2466351. Maximum sequence length: 2049, sample length: 2899 [default0]:Skipping sample id=2738219. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2753821. Maximum sequence length: 2049, sample length: 3941 [default0]:Skipping sample id=2716595. Maximum sequence length: 2049, sample length: 5789 [default0]:Skipping sample id=2743571. Maximum sequence length: 2049, sample length: 4768 [default0]:Skipping sample id=2730702. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2711005. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2730551. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2753660. Maximum sequence length: 2049, sample length: 5158 [default0]:Skipping sample id=2744895. Maximum sequence length: 2049, sample length: 2521 [default0]:Skipping sample id=2737965. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2737448. Maximum sequence length: 2049, sample length: 3074 [default0]:Skipping sample id=2721738. Maximum sequence length: 2049, sample length: 4266 [default0]:Skipping sample id=2753053. Maximum sequence length: 2049, sample length: 3552 [default0]:Skipping sample id=2754502. Maximum sequence length: 2049, sample length: 6058 [default0]:Skipping sample id=2755212. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2477739. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2714373. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2753252. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2742356. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2728455. Maximum sequence length: 2049, sample length: 3980 [default0]:Skipping sample id=2755203. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2752624. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2747410. Maximum sequence length: 2049, sample length: 4604 [default0]:Skipping sample id=2716187. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2741997. Maximum sequence length: 2049, sample length: 2531 [default0]:Skipping sample id=2723804. Maximum sequence length: 2049, sample length: 3044 [default0]:Skipping sample id=2736394. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2489558. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2722847. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2755617. Maximum sequence length: 2049, sample length: 2973 [default0]:Skipping sample id=2492621. Maximum sequence length: 2049, sample length: 4275 [default0]:Skipping sample id=2736294. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2727138. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2739939. Maximum sequence length: 2049, sample length: 4240 [default0]:Skipping sample id=2734956. Maximum sequence length: 2049, sample length: 6853 [default0]:Skipping sample id=2729165. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2753506. Maximum sequence length: 2049, sample length: 2740 [default0]:Skipping sample id=2723670. Maximum sequence length: 2049, sample length: 3033 [default0]:Skipping sample id=2756800. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2757006. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2495439. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2746191. Maximum sequence length: 2049, sample length: 4551 [default0]:Skipping sample id=2749961. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2729720. Maximum sequence length: 2049, sample length: 3326 [default0]:Skipping sample id=2745607. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2477526. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2733905. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2723527. Maximum sequence length: 2049, sample length: 3488 [default0]:Skipping sample id=2730245. Maximum sequence length: 2049, sample length: 3088 [default0]:Skipping sample id=2739270. Maximum sequence length: 2049, sample length: 3718 [default0]:Skipping sample id=2713144. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2495378. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2753327. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2479450. Maximum sequence length: 2049, sample length: 4323 [default0]:Skipping sample id=2729912. Maximum sequence length: 2049, sample length: 3872 [default0]:Skipping sample id=2729381. Maximum sequence length: 2049, sample length: 8496 [default0]:Skipping sample id=2730645. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2744945. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2736720. Maximum sequence length: 2049, sample length: 4545 [default0]:Skipping sample id=2721168. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2739576. Maximum sequence length: 2049, sample length: 3463 [default0]:Skipping sample id=2744268. Maximum sequence length: 2049, sample length: 3255 [default0]:Skipping sample id=2729801. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2484383. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2481377. Maximum sequence length: 2049, sample length: 4280 [default0]:Skipping sample id=2720393. Maximum sequence length: 2049, sample length: 3145 [default0]:Skipping sample id=2724684. Maximum sequence length: 2049, sample length: 3538 [default0]:Skipping sample id=2467681. Maximum sequence length: 2049, sample length: 3410 [default0]:Skipping sample id=2468179. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2748822. Maximum sequence length: 2049, sample length: 3650 [default0]:Skipping sample id=2718430. Maximum sequence length: 2049, sample length: 2934 [default0]:Skipping sample id=2712805. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2714885. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2754277. Maximum sequence length: 2049, sample length: 3475 [default0]:Skipping sample id=2743850. Maximum sequence length: 2049, sample length: 4809 [default0]:Skipping sample id=2737385. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2730336. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2746805. Maximum sequence length: 2049, sample length: 4134 [default0]:Skipping sample id=2735297. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2744939. Maximum sequence length: 2049, sample length: 5167 [default0]:Skipping sample id=2490901. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2725682. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2744894. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2494182. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2755090. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2717579. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2718931. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2714767. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2467014. Maximum sequence length: 2049, sample length: 3593 [default0]:Skipping sample id=2466208. Maximum sequence length: 2049, sample length: 3519 [default0]:Skipping sample id=2714833. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2713162. Maximum sequence length: 2049, sample length: 2676 [default0]:Skipping sample id=2742977. Maximum sequence length: 2049, sample length: 5801 [default0]:Skipping sample id=2477307. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2731426. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2727980. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2728101. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2715203. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2728707. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2748291. Maximum sequence length: 2049, sample length: 3947 [default0]:Skipping sample id=2753751. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2741154. Maximum sequence length: 2049, sample length: 5046 [default0]:Skipping sample id=2742583. Maximum sequence length: 2049, sample length: 3414 [default0]:Skipping sample id=2722481. Maximum sequence length: 2049, sample length: 5465 [default0]:Skipping sample id=2740371. Maximum sequence length: 2049, sample length: 4225 [default0]:Skipping sample id=2745711. Maximum sequence length: 2049, sample length: 3464 [default0]:Skipping sample id=2741573. Maximum sequence length: 2049, sample length: 4385 [default0]:Skipping sample id=2728860. Maximum sequence length: 2049, sample length: 3327 [default0]:Skipping sample id=2471128. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2732765. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2716112. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2733242. Maximum sequence length: 2049, sample length: 6218 [default0]:Skipping sample id=2746391. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2493536. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2470134. Maximum sequence length: 2049, sample length: 4284 [default0]:Skipping sample id=2467722. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2470594. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2739327. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2482366. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2731219. Maximum sequence length: 2049, sample length: 3009 [default0]:Skipping sample id=2736663. Maximum sequence length: 2049, sample length: 4838 [default0]:Skipping sample id=2483212. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2746161. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2731061. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2735454. Maximum sequence length: 2049, sample length: 2910 [default0]:Skipping sample id=2490209. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2495627. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2486902. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2471296. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2487430. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2726893. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2728857. Maximum sequence length: 2049, sample length: 6141 [default0]:Skipping sample id=2721216. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2722213. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2713972. Maximum sequence length: 2049, sample length: 3782 [default0]:Skipping sample id=2753213. Maximum sequence length: 2049, sample length: 3137 [default0]:Skipping sample id=2729918. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2743375. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2739237. Maximum sequence length: 2049, sample length: 3143 [default0]:Skipping sample id=2749453. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2744953. Maximum sequence length: 2049, sample length: 3357 [default0]:Skipping sample id=2716379. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2718252. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2726083. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2728939. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2751550. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2739104. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2749422. Maximum sequence length: 2049, sample length: 3931 [default0]:Skipping sample id=2712666. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2713600. Maximum sequence length: 2049, sample length: 4761 [default0]:Skipping sample id=2751297. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2716049. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2731725. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2726086. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2711402. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2468067. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2752180. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2485800. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2727373. Maximum sequence length: 2049, sample length: 5171 [default0]:Skipping sample id=2742064. Maximum sequence length: 2049, sample length: 2677 [default0]:Skipping sample id=2753979. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2730871. Maximum sequence length: 2049, sample length: 5383 [default0]:Skipping sample id=2733123. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2722610. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2724516. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2728881. Maximum sequence length: 2049, sample length: 3384 [default0]:Skipping sample id=2712733. Maximum sequence length: 2049, sample length: 3185 [default0]:Skipping sample id=2487155. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2496046. Maximum sequence length: 2049, sample length: 3014 [default0]:Skipping sample id=2751944. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2753759. Maximum sequence length: 2049, sample length: 2687 [default0]:Skipping sample id=2742744. Maximum sequence length: 2049, sample length: 3917 [default0]:Skipping sample id=2751582. Maximum sequence length: 2049, sample length: 2988 [default0]:Skipping sample id=2715729. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2736153. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2719719. Maximum sequence length: 2049, sample length: 3450 [default0]:Skipping sample id=2715889. Maximum sequence length: 2049, sample length: 3877 [default0]:Skipping sample id=2493687. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2728733. Maximum sequence length: 2049, sample length: 4352 [default0]:Skipping sample id=2720814. Maximum sequence length: 2049, sample length: 6345 [default0]:Skipping sample id=2732863. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2736113. Maximum sequence length: 2049, sample length: 5247 [default0]:Skipping sample id=2724518. Maximum sequence length: 2049, sample length: 3501 [default0]:Skipping sample id=2728145. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2722275. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2731156. Maximum sequence length: 2049, sample length: 4352 [default0]:Skipping sample id=2751625. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2736196. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2722406. Maximum sequence length: 2049, sample length: 2837 [default0]:Skipping sample id=2732305. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2754121. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2734786. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2713470. Maximum sequence length: 2049, sample length: 6151 [default0]:Skipping sample id=2728511. Maximum sequence length: 2049, sample length: 4361 [default0]:Skipping sample id=2719236. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2721857. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2735694. Maximum sequence length: 2049, sample length: 5980 [default0]:Skipping sample id=2737131. Maximum sequence length: 2049, sample length: 5012 [default0]:Skipping sample id=2740235. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2466041. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2467024. Maximum sequence length: 2049, sample length: 2844 [default0]:Skipping sample id=2477165. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2734853. Maximum sequence length: 2049, sample length: 5544 [default0]:Skipping sample id=2722228. Maximum sequence length: 2049, sample length: 4006 [default0]:Skipping sample id=2744744. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2469796. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2722366. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2745400. Maximum sequence length: 2049, sample length: 2995 [default0]:Skipping sample id=2485038. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2718833. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2466899. Maximum sequence length: 2049, sample length: 2695 [default0]:Skipping sample id=2714296. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2728882. Maximum sequence length: 2049, sample length: 3749 [default0]:Skipping sample id=2748893. Maximum sequence length: 2049, sample length: 2970 [default0]:Skipping sample id=2754982. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2750411. Maximum sequence length: 2049, sample length: 4296 [default0]:Skipping sample id=2727249. Maximum sequence length: 2049, sample length: 3603 [default0]:Skipping sample id=2753936. Maximum sequence length: 2049, sample length: 3956 [default0]:Skipping sample id=2741644. Maximum sequence length: 2049, sample length: 3552 [default0]:Skipping sample id=2725478. Maximum sequence length: 2049, sample length: 5971 [default0]:Skipping sample id=2726090. Maximum sequence length: 2049, sample length: 3026 [default0]:Skipping sample id=2716811. Maximum sequence length: 2049, sample length: 3734 [default0]:Skipping sample id=2735065. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2716044. Maximum sequence length: 2049, sample length: 3573 [default0]:Skipping sample id=2494808. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2718135. Maximum sequence length: 2049, sample length: 5042 [default0]:Skipping sample id=2727506. Maximum sequence length: 2049, sample length: 2749 [default0]:Skipping sample id=2726663. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2722096. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2754795. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2718803. Maximum sequence length: 2049, sample length: 6810 [default0]:Skipping sample id=2747023. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2727556. Maximum sequence length: 2049, sample length: 3458 [default0]:Skipping sample id=2737881. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2488319. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2742769. Maximum sequence length: 2049, sample length: 3605 [default0]:Skipping sample id=2756458. Maximum sequence length: 2049, sample length: 3997 [default0]:Skipping sample id=2736971. Maximum sequence length: 2049, sample length: 2515 [default0]:Skipping sample id=2481707. Maximum sequence length: 2049, sample length: 3113 [default0]:Skipping sample id=2742886. Maximum sequence length: 2049, sample length: 4901 [default0]:Skipping sample id=2716182. Maximum sequence length: 2049, sample length: 3778 [default0]:Skipping sample id=2719090. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2742435. Maximum sequence length: 2049, sample length: 3220 [default0]:Skipping sample id=2725132. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2719710. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2712159. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2715315. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2713859. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2724551. Maximum sequence length: 2049, sample length: 2876 [default0]:Skipping sample id=2734209. Maximum sequence length: 2049, sample length: 4971 [default0]:Skipping sample id=2715006. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2493519. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2731568. Maximum sequence length: 2049, sample length: 3080 [default0]:Skipping sample id=2717730. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2490519. Maximum sequence length: 2049, sample length: 2545 [default0]:Skipping sample id=2756950. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2712765. Maximum sequence length: 2049, sample length: 3637 [default0]:Skipping sample id=2713584. Maximum sequence length: 2049, sample length: 3447 [default0]:Skipping sample id=2736632. Maximum sequence length: 2049, sample length: 4112 [default0]:Skipping sample id=2734088. Maximum sequence length: 2049, sample length: 3397 [default0]:Skipping sample id=2756864. Maximum sequence length: 2049, sample length: 2947 [default0]:Skipping sample id=2494507. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2720820. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2738466. Maximum sequence length: 2049, sample length: 3892 [default0]:Skipping sample id=2755610. Maximum sequence length: 2049, sample length: 3666 [default0]:Skipping sample id=2754712. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2756743. Maximum sequence length: 2049, sample length: 3942 [default0]:Skipping sample id=2722226. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2725653. Maximum sequence length: 2049, sample length: 5549 [default0]:Skipping sample id=2718661. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2745323. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2750564. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2714531. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2717190. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2497522. Maximum sequence length: 2049, sample length: 3143 [default0]:Skipping sample id=2747460. Maximum sequence length: 2049, sample length: 4021 [default0]:Skipping sample id=2749369. Maximum sequence length: 2049, sample length: 3608 [default0]:Skipping sample id=2749475. Maximum sequence length: 2049, sample length: 3320 [default0]:Skipping sample id=2735210. Maximum sequence length: 2049, sample length: 3293 [default0]:Skipping sample id=2755038. Maximum sequence length: 2049, sample length: 4008 [default0]:Skipping sample id=2466143. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2727566. Maximum sequence length: 2049, sample length: 2781 [default0]:Skipping sample id=2712087. Maximum sequence length: 2049, sample length: 4332 [default0]:Skipping sample id=2744293. Maximum sequence length: 2049, sample length: 3924 [default0]:Skipping sample id=2739252. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2735292. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2750023. Maximum sequence length: 2049, sample length: 3245 [default0]:Skipping sample id=2714258. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2745780. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2749970. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2471165. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2741621. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2743839. Maximum sequence length: 2049, sample length: 2798 [default0]:Skipping sample id=2721919. Maximum sequence length: 2049, sample length: 3285 [default0]:Skipping sample id=2749495. Maximum sequence length: 2049, sample length: 3151 [default0]:Skipping sample id=2714394. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2499110. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2489010. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2727935. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2734952. Maximum sequence length: 2049, sample length: 4709 [default0]:Skipping sample id=2483879. Maximum sequence length: 2049, sample length: 3594 [default0]:Skipping sample id=2738436. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2745746. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2732735. Maximum sequence length: 2049, sample length: 2991 [default0]:Skipping sample id=2737857. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2722250. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2742029. Maximum sequence length: 2049, sample length: 4860 [default0]:Skipping sample id=2755613. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2482612. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2741438. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2498639. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2719775. Maximum sequence length: 2049, sample length: 4022 [default0]:Skipping sample id=2726554. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2755909. Maximum sequence length: 2049, sample length: 2602 [default0]:Skipping sample id=2736280. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2712931. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2729231. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2753243. Maximum sequence length: 2049, sample length: 5177 [default0]:Skipping sample id=2726502. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2484823. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2749099. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2721213. Maximum sequence length: 2049, sample length: 2949 [default0]:Skipping sample id=2471268. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2753548. Maximum sequence length: 2049, sample length: 2707 [default0]:Skipping sample id=2719379. Maximum sequence length: 2049, sample length: 2573 [default0]:Skipping sample id=2738880. Maximum sequence length: 2049, sample length: 4186 [default0]:Skipping sample id=2729162. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2715556. Maximum sequence length: 2049, sample length: 3439 [default0]:Skipping sample id=2483993. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2731390. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2746777. Maximum sequence length: 2049, sample length: 3150 [default0]:Skipping sample id=2728646. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2739998. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2711276. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2720186. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2732208. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2744468. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2752534. Maximum sequence length: 2049, sample length: 2440 [default0]:Skipping sample id=2493153. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2477818. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2498855. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2716856. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2492901. Maximum sequence length: 2049, sample length: 3027 [default0]:Skipping sample id=2736723. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2736055. Maximum sequence length: 2049, sample length: 3749 [default0]:Skipping sample id=2744244. Maximum sequence length: 2049, sample length: 6055 [default0]:Skipping sample id=2724108. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2712806. Maximum sequence length: 2049, sample length: 4122 [default0]:Skipping sample id=2713835. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2744846. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2724948. Maximum sequence length: 2049, sample length: 3875 [default0]:Skipping sample id=2722112. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2716689. Maximum sequence length: 2049, sample length: 4967 [default0]:Skipping sample id=2743456. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2739195. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2729458. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2725960. Maximum sequence length: 2049, sample length: 3566 [default0]:Skipping sample id=2718464. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2714810. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2727768. Maximum sequence length: 2049, sample length: 5322 [default0]:Skipping sample id=2746124. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2756149. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2715839. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2747290. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2726345. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2739192. Maximum sequence length: 2049, sample length: 3175 [default0]:Skipping sample id=2732563. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2752514. Maximum sequence length: 2049, sample length: 3303 [default0]:Skipping sample id=2754312. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2471001. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2725267. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2733789. Maximum sequence length: 2049, sample length: 4085 [default0]:Skipping sample id=2729041. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2753859. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2742489. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2728325. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2722694. Maximum sequence length: 2049, sample length: 4555 [default0]:Skipping sample id=2495805. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2712112. Maximum sequence length: 2049, sample length: 5143 [default0]:Skipping sample id=2487891. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2481185. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2735425. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2498614. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2467367. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2485723. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2725811. Maximum sequence length: 2049, sample length: 3571 [default0]:Skipping sample id=2712052. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2725607. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2738002. Maximum sequence length: 2049, sample length: 3118 [default0]:Skipping sample id=2753399. Maximum sequence length: 2049, sample length: 2902 [default0]:Skipping sample id=2488065. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2732231. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2729010. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2712722. Maximum sequence length: 2049, sample length: 3323 [default0]:Skipping sample id=2725550. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2478600. Maximum sequence length: 2049, sample length: 3616 [default0]:Skipping sample id=2738029. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2735827. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2716893. Maximum sequence length: 2049, sample length: 2678 [default0]:Skipping sample id=2729975. Maximum sequence length: 2049, sample length: 3332 [default0]:Skipping sample id=2736067. Maximum sequence length: 2049, sample length: 2777 [default0]:Skipping sample id=2744271. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2731238. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2751542. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2749974. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2748730. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2722520. Maximum sequence length: 2049, sample length: 4268 [default0]:Skipping sample id=2494333. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2484060. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2718051. Maximum sequence length: 2049, sample length: 4183 [default0]:Skipping sample id=2484021. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2729150. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2723144. Maximum sequence length: 2049, sample length: 3753 [default0]:Skipping sample id=2721485. Maximum sequence length: 2049, sample length: 3847 [default0]:Skipping sample id=2731420. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2725917. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2746693. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2731575. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2747783. Maximum sequence length: 2049, sample length: 3511 [default0]:Skipping sample id=2725201. Maximum sequence length: 2049, sample length: 3515 [default0]:Skipping sample id=2716553. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2734041. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2712005. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2498927. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2732458. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2750442. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2725103. Maximum sequence length: 2049, sample length: 3829 [default0]:Skipping sample id=2738762. Maximum sequence length: 2049, sample length: 6433 [default0]:Skipping sample id=2726460. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2715976. Maximum sequence length: 2049, sample length: 2400 [default0]:Skipping sample id=2749784. Maximum sequence length: 2049, sample length: 2489 [default0]:Skipping sample id=2745888. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2717879. Maximum sequence length: 2049, sample length: 3173 [default0]:Skipping sample id=2752480. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2715936. Maximum sequence length: 2049, sample length: 4562 [default0]:Skipping sample id=2499138. Maximum sequence length: 2049, sample length: 2908 [default0]:Skipping sample id=2723959. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2746064. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2724542. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2732588. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2723490. Maximum sequence length: 2049, sample length: 2912 [default0]:Skipping sample id=2755240. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2733087. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2727974. Maximum sequence length: 2049, sample length: 2470 [default0]:Skipping sample id=2735576. Maximum sequence length: 2049, sample length: 2531 [default0]:Skipping sample id=2715349. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2712066. Maximum sequence length: 2049, sample length: 6438 [default0]:Skipping sample id=2716004. Maximum sequence length: 2049, sample length: 4214 [default0]:Skipping sample id=2727623. Maximum sequence length: 2049, sample length: 3588 [default0]:Skipping sample id=2468761. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2494349. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2736578. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2724237. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2725529. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2739906. Maximum sequence length: 2049, sample length: 2972 [default0]:Skipping sample id=2495634. Maximum sequence length: 2049, sample length: 2669 [default0]:Skipping sample id=2732321. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2718530. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2750986. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2466808. Maximum sequence length: 2049, sample length: 2568 [default0]:Skipping sample id=2754781. Maximum sequence length: 2049, sample length: 3386 [default0]:Skipping sample id=2732503. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2737499. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2720194. Maximum sequence length: 2049, sample length: 3969 [default0]:Skipping sample id=2726686. Maximum sequence length: 2049, sample length: 2958 [default0]:Skipping sample id=2711599. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2722439. Maximum sequence length: 2049, sample length: 3221 [default0]:Skipping sample id=2732548. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2741625. Maximum sequence length: 2049, sample length: 2717 [default0]:Skipping sample id=2716988. Maximum sequence length: 2049, sample length: 2997 [default0]:Skipping sample id=2756740. Maximum sequence length: 2049, sample length: 2983 [default0]:Skipping sample id=2711643. Maximum sequence length: 2049, sample length: 4168 [default0]:Skipping sample id=2716844. Maximum sequence length: 2049, sample length: 3243 [default0]:Skipping sample id=2747864. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2487183. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2754771. Maximum sequence length: 2049, sample length: 4000 [default0]:Skipping sample id=2738663. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2726979. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2757094. Maximum sequence length: 2049, sample length: 3082 [default0]:Skipping sample id=2739523. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2746263. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2726532. Maximum sequence length: 2049, sample length: 5192 [default0]:Skipping sample id=2722287. Maximum sequence length: 2049, sample length: 3412 [default0]:Skipping sample id=2740521. Maximum sequence length: 2049, sample length: 4242 [default0]:Skipping sample id=2729494. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2733835. Maximum sequence length: 2049, sample length: 2120 [default0]:Skipping sample id=2491761. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2739554. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2744886. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2731631. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2754965. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2747353. Maximum sequence length: 2049, sample length: 3026 [default0]:Skipping sample id=2749144. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2722922. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2748585. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2467587. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2719829. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2726483. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2723205. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2721491. Maximum sequence length: 2049, sample length: 2758 [default0]:Skipping sample id=2752351. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2499274. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2734086. Maximum sequence length: 2049, sample length: 3968 [default0]:Skipping sample id=2741756. Maximum sequence length: 2049, sample length: 3316 [default0]:Skipping sample id=2754917. Maximum sequence length: 2049, sample length: 4017 [default0]:Skipping sample id=2746113. Maximum sequence length: 2049, sample length: 2961 [default0]:Skipping sample id=2479824. Maximum sequence length: 2049, sample length: 2758 [default0]:Skipping sample id=2717327. Maximum sequence length: 2049, sample length: 2934 [default0]:Skipping sample id=2733249. Maximum sequence length: 2049, sample length: 3175 [default0]:Skipping sample id=2482532. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2477451. Maximum sequence length: 2049, sample length: 3515 [default0]:Skipping sample id=2740432. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2726307. Maximum sequence length: 2049, sample length: 2924 [default0]:Skipping sample id=2739254. Maximum sequence length: 2049, sample length: 3265 [default0]:Skipping sample id=2751969. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2465841. Maximum sequence length: 2049, sample length: 3594 [default0]:Skipping sample id=2724962. Maximum sequence length: 2049, sample length: 3963 [default0]:Skipping sample id=2726624. Maximum sequence length: 2049, sample length: 3287 [default0]:Skipping sample id=2751255. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2757085. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2479799. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2714563. Maximum sequence length: 2049, sample length: 3087 [default0]:Skipping sample id=2492139. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2725890. Maximum sequence length: 2049, sample length: 3000 [default0]:Skipping sample id=2728799. Maximum sequence length: 2049, sample length: 3016 [default0]:Skipping sample id=2711964. Maximum sequence length: 2049, sample length: 4160 [default0]:Skipping sample id=2713406. Maximum sequence length: 2049, sample length: 2997 [default0]:Skipping sample id=2725766. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2498463. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2729268. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2711968. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2729748. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2732163. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2722470. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2739547. Maximum sequence length: 2049, sample length: 2816 [default0]:Skipping sample id=2480040. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2469232. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2728222. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2752177. Maximum sequence length: 2049, sample length: 5051 [default0]:Skipping sample id=2490325. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2754616. Maximum sequence length: 2049, sample length: 3340 [default0]:Skipping sample id=2746610. Maximum sequence length: 2049, sample length: 3539 [default0]:Skipping sample id=2718286. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2726309. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2751822. Maximum sequence length: 2049, sample length: 2870 [default0]:Skipping sample id=2718800. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2720957. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2743085. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2713794. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2728152. Maximum sequence length: 2049, sample length: 3627 [default0]:Skipping sample id=2739150. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2487844. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2728351. Maximum sequence length: 2049, sample length: 3813 [default0]:Skipping sample id=2744280. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2739752. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2493848. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2728557. Maximum sequence length: 2049, sample length: 2649 [default0]:Skipping sample id=2490963. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2728309. Maximum sequence length: 2049, sample length: 4173 [default0]:Skipping sample id=2735723. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2752117. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2722681. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2484232. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2722326. Maximum sequence length: 2049, sample length: 4055 [default0]:Skipping sample id=2466617. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2725894. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2741022. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2734380. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2751786. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2716455. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2711207. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2742955. Maximum sequence length: 2049, sample length: 3132 [default0]:Skipping sample id=2740759. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2733282. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2498399. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2715607. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2753378. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2494150. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2738102. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2490175. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2740935. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2724133. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2723698. Maximum sequence length: 2049, sample length: 2783 [default0]:Skipping sample id=2712821. Maximum sequence length: 2049, sample length: 4014 [default0]:Skipping sample id=2741055. Maximum sequence length: 2049, sample length: 2778 [default0]:Skipping sample id=2467339. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2752663. Maximum sequence length: 2049, sample length: 4135 [default0]:Skipping sample id=2724411. Maximum sequence length: 2049, sample length: 2953 [default0]:Skipping sample id=2745141. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2716933. Maximum sequence length: 2049, sample length: 4166 [default0]:Skipping sample id=2471192. Maximum sequence length: 2049, sample length: 2910 [default0]:Skipping sample id=2711938. Maximum sequence length: 2049, sample length: 2900 [default0]:Skipping sample id=2730074. Maximum sequence length: 2049, sample length: 3240 [default0]:Skipping sample id=2754936. Maximum sequence length: 2049, sample length: 2826 [default0]:Skipping sample id=2743013. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2750780. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2738926. Maximum sequence length: 2049, sample length: 3737 [default0]:Skipping sample id=2745665. Maximum sequence length: 2049, sample length: 4574 [default0]:Skipping sample id=2723250. Maximum sequence length: 2049, sample length: 5388 [default0]:Skipping sample id=2740243. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2755681. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2482473. Maximum sequence length: 2049, sample length: 3100 [default0]:Skipping sample id=2756316. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2741125. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2755462. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2717537. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2713070. Maximum sequence length: 2049, sample length: 3364 [default0]:Skipping sample id=2741092. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2756602. Maximum sequence length: 2049, sample length: 3813 [default0]:Skipping sample id=2735304. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2727547. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2481279. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2724064. Maximum sequence length: 2049, sample length: 3177 [default0]:Skipping sample id=2745689. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2477262. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2741969. Maximum sequence length: 2049, sample length: 4347 [default0]:Skipping sample id=2742996. Maximum sequence length: 2049, sample length: 3107 [default0]:Skipping sample id=2717498. Maximum sequence length: 2049, sample length: 3211 [default0]:Skipping sample id=2736673. Maximum sequence length: 2049, sample length: 2886 [default0]:Skipping sample id=2729255. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2727117. Maximum sequence length: 2049, sample length: 5302 [default0]:Skipping sample id=2725281. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2467426. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2714792. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2744683. Maximum sequence length: 2049, sample length: 3456 [default0]:Skipping sample id=2750511. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2486696. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2748187. Maximum sequence length: 2049, sample length: 2642 [default0]:Skipping sample id=2737462. Maximum sequence length: 2049, sample length: 3745 [default0]:Skipping sample id=2743798. Maximum sequence length: 2049, sample length: 2590 [default0]:Skipping sample id=2487993. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2727374. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2736285. Maximum sequence length: 2049, sample length: 4352 [default0]:Skipping sample id=2717704. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2748292. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2718093. Maximum sequence length: 2049, sample length: 3823 [default0]:Skipping sample id=2741790. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2749186. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2750892. Maximum sequence length: 2049, sample length: 3587 [default0]:Skipping sample id=2715059. Maximum sequence length: 2049, sample length: 5087 [default0]:Skipping sample id=2713961. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2489129. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2480004. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2746762. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2740588. Maximum sequence length: 2049, sample length: 3350 [default0]:Skipping sample id=2714938. Maximum sequence length: 2049, sample length: 2920 [default0]:Skipping sample id=2743388. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2493180. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2747893. Maximum sequence length: 2049, sample length: 3134 [default0]:Skipping sample id=2717258. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2734566. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2718337. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2739080. Maximum sequence length: 2049, sample length: 3124 [default0]:Skipping sample id=2726122. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2734577. Maximum sequence length: 2049, sample length: 4026 [default0]:Skipping sample id=2471337. Maximum sequence length: 2049, sample length: 3395 [default0]:Skipping sample id=2756541. Maximum sequence length: 2049, sample length: 5052 [default0]:Skipping sample id=2734609. Maximum sequence length: 2049, sample length: 3380 [default0]:Skipping sample id=2718570. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2741773. Maximum sequence length: 2049, sample length: 4262 [default0]:Skipping sample id=2733076. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2468149. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2471341. Maximum sequence length: 2049, sample length: 3120 [default0]:Skipping sample id=2736881. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2746874. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2723157. Maximum sequence length: 2049, sample length: 4333 [default0]:Skipping sample id=2719835. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2756362. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2739545. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2729855. Maximum sequence length: 2049, sample length: 3641 [default0]:Skipping sample id=2480105. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2749717. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2720173. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2483364. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2741905. Maximum sequence length: 2049, sample length: 4671 [default0]:Skipping sample id=2486066. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2486084. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2736644. Maximum sequence length: 2049, sample length: 4261 [default0]:Skipping sample id=2477885. Maximum sequence length: 2049, sample length: 2773 [default0]:Skipping sample id=2729767. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2755224. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2716121. Maximum sequence length: 2049, sample length: 2841 [default0]:Skipping sample id=2718918. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2716356. Maximum sequence length: 2049, sample length: 3840 [default0]:Skipping sample id=2752589. Maximum sequence length: 2049, sample length: 4018 [default0]:Skipping sample id=2719813. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2736819. Maximum sequence length: 2049, sample length: 3518 [default0]:Skipping sample id=2723543. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2752060. Maximum sequence length: 2049, sample length: 2894 [default0]:Skipping sample id=2717237. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2711754. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2744753. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2750750. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2749851. Maximum sequence length: 2049, sample length: 3637 [default0]:Skipping sample id=2749027. Maximum sequence length: 2049, sample length: 4109 [default0]:Skipping sample id=2748234. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2751176. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2718965. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2742528. Maximum sequence length: 2049, sample length: 2597 [default0]:Skipping sample id=2728987. Maximum sequence length: 2049, sample length: 4751 [default0]:Skipping sample id=2753098. Maximum sequence length: 2049, sample length: 6262 [default0]:Skipping sample id=2716324. Maximum sequence length: 2049, sample length: 4638 [default0]:Skipping sample id=2735465. Maximum sequence length: 2049, sample length: 3110 [default0]:Skipping sample id=2726534. Maximum sequence length: 2049, sample length: 3921 [default0]:Skipping sample id=2495928. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2492517. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2754626. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2735392. Maximum sequence length: 2049, sample length: 4312 [default0]:Skipping sample id=2754190. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2488328. Maximum sequence length: 2049, sample length: 3584 [default0]:Skipping sample id=2731103. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2712147. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2744538. Maximum sequence length: 2049, sample length: 5610 [default0]:Skipping sample id=2752262. Maximum sequence length: 2049, sample length: 3010 [default0]:Skipping sample id=2741654. Maximum sequence length: 2049, sample length: 4753 [default0]:Skipping sample id=2717524. Maximum sequence length: 2049, sample length: 3284 [default0]:Skipping sample id=2730829. Maximum sequence length: 2049, sample length: 3569 [default0]:Skipping sample id=2717564. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2713384. Maximum sequence length: 2049, sample length: 4473 [default0]:Skipping sample id=2737102. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2714017. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2747745. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2740339. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2715023. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2741566. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2741048. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2755233. Maximum sequence length: 2049, sample length: 4605 [default0]:Skipping sample id=2720886. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2738561. Maximum sequence length: 2049, sample length: 3083 [default0]:Skipping sample id=2490554. Maximum sequence length: 2049, sample length: 3838 [default0]:Skipping sample id=2738879. Maximum sequence length: 2049, sample length: 4789 [default0]:Skipping sample id=2752794. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2727775. Maximum sequence length: 2049, sample length: 4122 [default0]:Skipping sample id=2745096. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2486517. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2755004. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2725867. Maximum sequence length: 2049, sample length: 3920 [default0]:Skipping sample id=2740979. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2734217. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2717458. Maximum sequence length: 2049, sample length: 2574 [default0]:Skipping sample id=2737384. Maximum sequence length: 2049, sample length: 3338 [default0]:Skipping sample id=2728298. Maximum sequence length: 2049, sample length: 5811 [default0]:Skipping sample id=2721388. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2744935. Maximum sequence length: 2049, sample length: 2704 [default0]:Skipping sample id=2736954. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2729297. Maximum sequence length: 2049, sample length: 4081 [default0]:Skipping sample id=2746308. Maximum sequence length: 2049, sample length: 3765 [default0]:Skipping sample id=2748921. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2732139. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2484587. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2746283. Maximum sequence length: 2049, sample length: 5047 [default0]:Skipping sample id=2719120. Maximum sequence length: 2049, sample length: 3432 [default0]:Skipping sample id=2745203. Maximum sequence length: 2049, sample length: 6240 [default0]:Skipping sample id=2720078. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2712047. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2716499. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2752038. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2757041. Maximum sequence length: 2049, sample length: 4758 [default0]:Skipping sample id=2740538. Maximum sequence length: 2049, sample length: 6445 [default0]:Skipping sample id=2746206. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2747004. Maximum sequence length: 2049, sample length: 4789 [default0]:Skipping sample id=2749914. Maximum sequence length: 2049, sample length: 4256 [default0]:Skipping sample id=2724132. Maximum sequence length: 2049, sample length: 4652 [default0]:Skipping sample id=2753214. Maximum sequence length: 2049, sample length: 3908 [default0]:Skipping sample id=2741246. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2718791. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2712476. Maximum sequence length: 2049, sample length: 5086 [default0]:Skipping sample id=2736143. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2724113. Maximum sequence length: 2049, sample length: 2969 [default0]:Skipping sample id=2734220. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2756452. Maximum sequence length: 2049, sample length: 4503 [default0]:Skipping sample id=2712369. Maximum sequence length: 2049, sample length: 3211 [default0]:Skipping sample id=2722300. Maximum sequence length: 2049, sample length: 3853 [default0]:Skipping sample id=2745127. Maximum sequence length: 2049, sample length: 2654 [default0]:Skipping sample id=2713993. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2756641. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2728066. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2740634. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2481657. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2749925. Maximum sequence length: 2049, sample length: 3461 [default0]:Skipping sample id=2753996. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2725884. Maximum sequence length: 2049, sample length: 3626 [default0]:Skipping sample id=2749496. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2712129. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2728885. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2467703. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2756848. Maximum sequence length: 2049, sample length: 2951 [default0]:Skipping sample id=2479110. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2747331. Maximum sequence length: 2049, sample length: 3101 [default0]:Skipping sample id=2497538. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2491900. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2487368. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2716431. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2756274. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2722146. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2741230. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2726362. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2717667. Maximum sequence length: 2049, sample length: 2869 [default0]:Skipping sample id=2756727. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2753211. Maximum sequence length: 2049, sample length: 2545 [default0]:Skipping sample id=2745814. Maximum sequence length: 2049, sample length: 3699 [default0]:Skipping sample id=2732885. Maximum sequence length: 2049, sample length: 3200 [default0]:Skipping sample id=2753084. Maximum sequence length: 2049, sample length: 2558 [default0]:Skipping sample id=2481039. Maximum sequence length: 2049, sample length: 3112 [default0]:Skipping sample id=2731739. Maximum sequence length: 2049, sample length: 2746 [default0]:Skipping sample id=2713103. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2755227. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2742130. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2727863. Maximum sequence length: 2049, sample length: 5076 [default0]:Skipping sample id=2745989. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2741678. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2724036. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2712220. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2492610. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2496416. Maximum sequence length: 2049, sample length: 3545 [default0]:Skipping sample id=2735002. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2470979. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2737695. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2743349. Maximum sequence length: 2049, sample length: 2676 [default0]:Skipping sample id=2728822. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2717476. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2714610. Maximum sequence length: 2049, sample length: 3736 [default0]:Skipping sample id=2713850. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2479550. Maximum sequence length: 2049, sample length: 3543 [default0]:Skipping sample id=2727718. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2750184. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2745862. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2495671. Maximum sequence length: 2049, sample length: 3235 [default0]:Skipping sample id=2742667. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2713440. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2733394. Maximum sequence length: 2049, sample length: 3093 [default0]:Skipping sample id=2716157. Maximum sequence length: 2049, sample length: 3023 [default0]:Skipping sample id=2711326. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2718227. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2745427. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2711682. Maximum sequence length: 2049, sample length: 3734 [default0]:Skipping sample id=2734915. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2467942. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2492588. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2468078. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2746126. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2740306. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2492946. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2737183. Maximum sequence length: 2049, sample length: 3420 [default0]:Skipping sample id=2735560. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2478950. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2737202. Maximum sequence length: 2049, sample length: 4013 [default0]:Skipping sample id=2750809. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2489748. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2726614. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2745990. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2744263. Maximum sequence length: 2049, sample length: 3065 [default0]:Skipping sample id=2741992. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2477589. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2712398. Maximum sequence length: 2049, sample length: 3263 [default0]:Skipping sample id=2723693. Maximum sequence length: 2049, sample length: 3211 [default0]:Skipping sample id=2748840. Maximum sequence length: 2049, sample length: 3150 [default0]:Skipping sample id=2756810. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2488156. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2734147. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2751249. Maximum sequence length: 2049, sample length: 2616 [default0]:Skipping sample id=2756549. Maximum sequence length: 2049, sample length: 5628 [default0]:Skipping sample id=2719314. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2724724. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2750027. Maximum sequence length: 2049, sample length: 2771 [default0]:Skipping sample id=2752744. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2751560. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2728604. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2753557. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2495237. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2714063. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2736209. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2730498. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2732181. Maximum sequence length: 2049, sample length: 2944 [default0]:Skipping sample id=2746291. Maximum sequence length: 2049, sample length: 5623 [default0]:Skipping sample id=2736195. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2728153. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2748281. Maximum sequence length: 2049, sample length: 4421 [default0]:Skipping sample id=2493445. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2756862. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2726594. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2713592. Maximum sequence length: 2049, sample length: 3258 [default0]:Skipping sample id=2738841. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2712574. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2731806. Maximum sequence length: 2049, sample length: 2862 [default0]:Skipping sample id=2743540. Maximum sequence length: 2049, sample length: 2575 [default0]:Skipping sample id=2740492. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2724900. Maximum sequence length: 2049, sample length: 3863 [default0]:Skipping sample id=2499237. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2479682. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2493361. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2717313. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2731635. Maximum sequence length: 2049, sample length: 2500 [default0]:Skipping sample id=2722738. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2743639. Maximum sequence length: 2049, sample length: 2995 [default0]:Skipping sample id=2744448. Maximum sequence length: 2049, sample length: 6630 [default0]:Skipping sample id=2753843. Maximum sequence length: 2049, sample length: 2850 [default0]:Skipping sample id=2728959. Maximum sequence length: 2049, sample length: 3397 [default0]:Skipping sample id=2716396. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2482374. Maximum sequence length: 2049, sample length: 2778 [default0]:Skipping sample id=2735855. Maximum sequence length: 2049, sample length: 3037 [default0]:Skipping sample id=2730018. Maximum sequence length: 2049, sample length: 4074 [default0]:Skipping sample id=2732294. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2725105. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2728619. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2752607. Maximum sequence length: 2049, sample length: 2888 [default0]:Skipping sample id=2715114. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2483105. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2713526. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2737553. Maximum sequence length: 2049, sample length: 4365 [default0]:Skipping sample id=2737144. Maximum sequence length: 2049, sample length: 3197 [default0]:Skipping sample id=2483193. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2726295. Maximum sequence length: 2049, sample length: 3309 [default0]:Skipping sample id=2723307. Maximum sequence length: 2049, sample length: 3949 [default0]:Skipping sample id=2735453. Maximum sequence length: 2049, sample length: 3714 [default0]:Skipping sample id=2727716. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2730293. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2727758. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2741071. Maximum sequence length: 2049, sample length: 3991 [default0]:Skipping sample id=2721964. Maximum sequence length: 2049, sample length: 3372 [default0]:Skipping sample id=2737474. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2750609. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2719389. Maximum sequence length: 2049, sample length: 3494 [default0]:Skipping sample id=2730026. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2714493. Maximum sequence length: 2049, sample length: 2855 [default0]:Skipping sample id=2751488. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2746774. Maximum sequence length: 2049, sample length: 3842 [default0]:Skipping sample id=2717071. Maximum sequence length: 2049, sample length: 3186 [default0]:Skipping sample id=2725670. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2724766. Maximum sequence length: 2049, sample length: 4433 [default0]:Skipping sample id=2737533. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2720459. Maximum sequence length: 2049, sample length: 2585 [default0]:Skipping sample id=2713312. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2749218. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2720720. Maximum sequence length: 2049, sample length: 4330 [default0]:Skipping sample id=2737615. Maximum sequence length: 2049, sample length: 3917 [default0]:Skipping sample id=2725234. Maximum sequence length: 2049, sample length: 4642 [default0]:Skipping sample id=2747179. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2712009. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2721547. Maximum sequence length: 2049, sample length: 2822 [default0]:Skipping sample id=2711053. Maximum sequence length: 2049, sample length: 3778 [default0]:Skipping sample id=2496051. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2756511. Maximum sequence length: 2049, sample length: 3417 [default0]:Skipping sample id=2741673. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2748641. Maximum sequence length: 2049, sample length: 2998 [default0]:Skipping sample id=2712389. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2728771. Maximum sequence length: 2049, sample length: 4463 [default0]:Skipping sample id=2728914. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2723991. Maximum sequence length: 2049, sample length: 3422 [default0]:Skipping sample id=2737718. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2717440. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2481773. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2731044. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2743721. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2714965. Maximum sequence length: 2049, sample length: 2938 [default0]:Skipping sample id=2718363. Maximum sequence length: 2049, sample length: 4865 [default0]:Skipping sample id=2725828. Maximum sequence length: 2049, sample length: 4031 [default0]:Skipping sample id=2720553. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2721179. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2715982. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2737793. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2728788. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2732239. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2737933. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2734779. Maximum sequence length: 2049, sample length: 6629 [default0]:Skipping sample id=2493981. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2715883. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2737792. Maximum sequence length: 2049, sample length: 3426 [default0]:Skipping sample id=2754641. Maximum sequence length: 2049, sample length: 3271 [default0]:Skipping sample id=2754075. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2726055. Maximum sequence length: 2049, sample length: 3724 [default0]:Skipping sample id=2477132. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2716355. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2751182. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2729115. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2718975. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2715345. Maximum sequence length: 2049, sample length: 2971 [default0]:Skipping sample id=2714798. Maximum sequence length: 2049, sample length: 5235 [default0]:Skipping sample id=2753797. Maximum sequence length: 2049, sample length: 3551 [default0]:Skipping sample id=2752331. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2754613. Maximum sequence length: 2049, sample length: 4138 [default0]:Skipping sample id=2722764. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2751515. Maximum sequence length: 2049, sample length: 2874 [default0]:Skipping sample id=2493610. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2742267. Maximum sequence length: 2049, sample length: 4126 [default0]:Skipping sample id=2736068. Maximum sequence length: 2049, sample length: 2964 [default0]:Skipping sample id=2470943. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2720212. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2745124. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2742647. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2722292. Maximum sequence length: 2049, sample length: 7278 [default0]:Skipping sample id=2741850. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2749098. Maximum sequence length: 2049, sample length: 4510 [default0]:Skipping sample id=2719482. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2734049. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2494661. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2711568. Maximum sequence length: 2049, sample length: 3870 [default0]:Skipping sample id=2713729. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2745719. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2715536. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2723326. Maximum sequence length: 2049, sample length: 3997 [default0]:Skipping sample id=2735904. Maximum sequence length: 2049, sample length: 6421 [default0]:Skipping sample id=2725149. Maximum sequence length: 2049, sample length: 4786 [default0]:Skipping sample id=2713764. Maximum sequence length: 2049, sample length: 5766 [default0]:Skipping sample id=2730683. Maximum sequence length: 2049, sample length: 2966 [default0]:Skipping sample id=2745208. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2742797. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2725284. Maximum sequence length: 2049, sample length: 3595 [default0]:Skipping sample id=2737253. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2725229. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2489408. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2723306. Maximum sequence length: 2049, sample length: 3150 [default0]:Skipping sample id=2746203. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2729069. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2717702. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2755401. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2738395. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2741477. Maximum sequence length: 2049, sample length: 2941 [default0]:Skipping sample id=2749046. Maximum sequence length: 2049, sample length: 2784 [default0]:Skipping sample id=2717808. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2719292. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2497589. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2723439. Maximum sequence length: 2049, sample length: 4423 [default0]:Skipping sample id=2484253. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2712643. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2734416. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2718623. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2738258. Maximum sequence length: 2049, sample length: 4341 [default0]:Skipping sample id=2496373. Maximum sequence length: 2049, sample length: 3546 [default0]:Skipping sample id=2730302. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2736776. Maximum sequence length: 2049, sample length: 4669 [default0]:Skipping sample id=2736079. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2734376. Maximum sequence length: 2049, sample length: 5363 [default0]:Skipping sample id=2724437. Maximum sequence length: 2049, sample length: 3428 [default0]:Skipping sample id=2496950. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2717368. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2722120. Maximum sequence length: 2049, sample length: 3620 [default0]:Skipping sample id=2742024. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2487071. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2738207. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2722454. Maximum sequence length: 2049, sample length: 3078 [default0]:Skipping sample id=2731014. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2736927. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2719521. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2732819. Maximum sequence length: 2049, sample length: 6946 [default0]:Skipping sample id=2723444. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2477641. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2725298. Maximum sequence length: 2049, sample length: 3841 [default0]:Skipping sample id=2745061. Maximum sequence length: 2049, sample length: 2639 [default0]:Skipping sample id=2713855. Maximum sequence length: 2049, sample length: 2962 [default0]:Skipping sample id=2717654. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2745599. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2739816. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2486193. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2728252. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2756076. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2734168. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2742601. Maximum sequence length: 2049, sample length: 3395 [default0]:Skipping sample id=2757081. Maximum sequence length: 2049, sample length: 2852 [default0]:Skipping sample id=2499405. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2723255. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2732124. Maximum sequence length: 2049, sample length: 3448 [default0]:Skipping sample id=2747323. Maximum sequence length: 2049, sample length: 3315 [default0]:Skipping sample id=2711652. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2722100. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2721985. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2732935. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2751769. Maximum sequence length: 2049, sample length: 3061 [default0]:Skipping sample id=2741379. Maximum sequence length: 2049, sample length: 3710 [default0]:Skipping sample id=2487296. Maximum sequence length: 2049, sample length: 2758 [default0]:Skipping sample id=2725788. Maximum sequence length: 2049, sample length: 3324 [default0]:Skipping sample id=2731251. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2493021. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2724984. Maximum sequence length: 2049, sample length: 4300 [default0]:Skipping sample id=2711759. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2741038. Maximum sequence length: 2049, sample length: 2741 [default0]:Skipping sample id=2742231. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2721998. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2742179. Maximum sequence length: 2049, sample length: 5829 [default0]:Skipping sample id=2733889. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2715944. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2735407. Maximum sequence length: 2049, sample length: 4768 [default0]:Skipping sample id=2729009. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2741498. Maximum sequence length: 2049, sample length: 3827 [default0]:Skipping sample id=2714664. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2739022. Maximum sequence length: 2049, sample length: 3909 [default0]:Skipping sample id=2753112. Maximum sequence length: 2049, sample length: 3086 [default0]:Skipping sample id=2730859. Maximum sequence length: 2049, sample length: 4491 [default0]:Skipping sample id=2713657. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2715988. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2752241. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2728688. Maximum sequence length: 2049, sample length: 2924 [default0]:Skipping sample id=2747501. Maximum sequence length: 2049, sample length: 3253 [default0]:Skipping sample id=2736620. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2748741. Maximum sequence length: 2049, sample length: 3344 [default0]:Skipping sample id=2747642. Maximum sequence length: 2049, sample length: 3347 [default0]:Skipping sample id=2725569. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2488161. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2727822. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2711250. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2748214. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2742352. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2728656. Maximum sequence length: 2049, sample length: 3471 [default0]:Skipping sample id=2714060. Maximum sequence length: 2049, sample length: 3192 [default0]:Skipping sample id=2745818. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2469828. Maximum sequence length: 2049, sample length: 4276 [default0]:Skipping sample id=2741015. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2732480. Maximum sequence length: 2049, sample length: 4149 [default0]:Skipping sample id=2736051. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2739978. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2478140. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2717301. Maximum sequence length: 2049, sample length: 3149 [default0]:Skipping sample id=2732913. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2712761. Maximum sequence length: 2049, sample length: 3697 [default0]:Skipping sample id=2490694. Maximum sequence length: 2049, sample length: 4317 [default0]:Skipping sample id=2732761. Maximum sequence length: 2049, sample length: 3466 [default0]:Skipping sample id=2493273. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2498315. Maximum sequence length: 2049, sample length: 2923 [default0]:Skipping sample id=2721123. Maximum sequence length: 2049, sample length: 3577 [default0]:Skipping sample id=2488517. Maximum sequence length: 2049, sample length: 3066 [default0]:Skipping sample id=2757001. Maximum sequence length: 2049, sample length: 2440 [default0]:Skipping sample id=2712684. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2716655. Maximum sequence length: 2049, sample length: 4683 [default0]:Skipping sample id=2486735. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2756951. Maximum sequence length: 2049, sample length: 2869 [default0]:Skipping sample id=2752910. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2742034. Maximum sequence length: 2049, sample length: 2788 [default0]:Skipping sample id=2466646. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2716292. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2735717. Maximum sequence length: 2049, sample length: 4174 [default0]:Skipping sample id=2730939. Maximum sequence length: 2049, sample length: 2964 [default0]:Skipping sample id=2477735. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2713692. Maximum sequence length: 2049, sample length: 5125 [default0]:Skipping sample id=2722203. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2740075. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2712101. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2732911. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2729289. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2485618. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2714762. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2741764. Maximum sequence length: 2049, sample length: 4230 [default0]:Skipping sample id=2733565. Maximum sequence length: 2049, sample length: 3932 [default0]:Skipping sample id=2481001. Maximum sequence length: 2049, sample length: 3106 [default0]:Skipping sample id=2495508. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2714687. Maximum sequence length: 2049, sample length: 3250 [default0]:Skipping sample id=2497560. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2726891. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2749651. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2491938. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2741543. Maximum sequence length: 2049, sample length: 4415 [default0]:Skipping sample id=2731311. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2751895. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2733327. Maximum sequence length: 2049, sample length: 2865 [default0]:Skipping sample id=2716586. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2741197. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2756699. Maximum sequence length: 2049, sample length: 4121 [default0]:Skipping sample id=2736332. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2755571. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2738070. Maximum sequence length: 2049, sample length: 3019 [default0]:Skipping sample id=2714874. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2720786. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2734425. Maximum sequence length: 2049, sample length: 3602 [default0]:Skipping sample id=2740610. Maximum sequence length: 2049, sample length: 2838 [default0]:Skipping sample id=2731243. Maximum sequence length: 2049, sample length: 3424 [default0]:Skipping sample id=2729711. Maximum sequence length: 2049, sample length: 4176 [default0]:Skipping sample id=2744027. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2732415. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2725303. Maximum sequence length: 2049, sample length: 4596 [default0]:Skipping sample id=2755002. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2730439. Maximum sequence length: 2049, sample length: 3588 [default0]:Skipping sample id=2754882. Maximum sequence length: 2049, sample length: 2963 [default0]:Skipping sample id=2755131. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2749827. Maximum sequence length: 2049, sample length: 3295 [default0]:Skipping sample id=2720756. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2731345. Maximum sequence length: 2049, sample length: 3073 [default0]:Skipping sample id=2749486. Maximum sequence length: 2049, sample length: 3011 [default0]:Skipping sample id=2713081. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2712284. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2724450. Maximum sequence length: 2049, sample length: 2569 [default0]:Skipping sample id=2725927. Maximum sequence length: 2049, sample length: 3276 [default0]:Skipping sample id=2725299. Maximum sequence length: 2049, sample length: 5238 [default0]:Skipping sample id=2749156. Maximum sequence length: 2049, sample length: 2855 [default0]:Skipping sample id=2746058. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2755051. Maximum sequence length: 2049, sample length: 3760 [default0]:Skipping sample id=2737101. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2714750. Maximum sequence length: 2049, sample length: 3741 [default0]:Skipping sample id=2493157. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2748322. Maximum sequence length: 2049, sample length: 3566 [default0]:Skipping sample id=2713793. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2484897. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2727501. Maximum sequence length: 2049, sample length: 5456 [default0]:Skipping sample id=2735319. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2726251. Maximum sequence length: 2049, sample length: 4295 [default0]:Skipping sample id=2716881. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2730789. Maximum sequence length: 2049, sample length: 3809 [default0]:Skipping sample id=2725465. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2751770. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2479750. Maximum sequence length: 2049, sample length: 2854 [default0]:Skipping sample id=2485984. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2499232. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2712501. Maximum sequence length: 2049, sample length: 2904 [default0]:Skipping sample id=2739745. Maximum sequence length: 2049, sample length: 3851 [default0]:Skipping sample id=2739987. Maximum sequence length: 2049, sample length: 3400 [default0]:Skipping sample id=2720209. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2735024. Maximum sequence length: 2049, sample length: 4416 [default0]:Skipping sample id=2479815. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2728337. Maximum sequence length: 2049, sample length: 5429 [default0]:Skipping sample id=2729805. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2736661. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2484122. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2467582. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2469201. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2712613. Maximum sequence length: 2049, sample length: 4563 [default0]:Skipping sample id=2751882. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2487136. Maximum sequence length: 2049, sample length: 2853 [default0]:Skipping sample id=2712378. Maximum sequence length: 2049, sample length: 4485 [default0]:Skipping sample id=2755125. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2736863. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2735217. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2751682. Maximum sequence length: 2049, sample length: 3118 [default0]:Skipping sample id=2747066. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2494826. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2481194. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2726642. Maximum sequence length: 2049, sample length: 5076 [default0]:Skipping sample id=2750790. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2748786. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2480423. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2753711. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2737841. Maximum sequence length: 2049, sample length: 2974 [default0]:Skipping sample id=2715769. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2722382. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2718750. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2740107. Maximum sequence length: 2049, sample length: 5050 [default0]:Skipping sample id=2733751. Maximum sequence length: 2049, sample length: 4431 [default0]:Skipping sample id=2728815. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2477265. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2727058. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2741295. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2718595. Maximum sequence length: 2049, sample length: 3642 [default0]:Skipping sample id=2741649. Maximum sequence length: 2049, sample length: 4070 [default0]:Skipping sample id=2745359. Maximum sequence length: 2049, sample length: 2967 [default0]:Skipping sample id=2465908. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2749266. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2750217. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2750190. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2746325. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2746961. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2737125. Maximum sequence length: 2049, sample length: 4076 [default0]:Skipping sample id=2725615. Maximum sequence length: 2049, sample length: 3447 [default0]:Skipping sample id=2724608. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2718978. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2736566. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2750583. Maximum sequence length: 2049, sample length: 3808 [default0]:Skipping sample id=2735420. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2731372. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2738978. Maximum sequence length: 2049, sample length: 3833 [default0]:Skipping sample id=2719200. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2731677. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2715117. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2736561. Maximum sequence length: 2049, sample length: 4817 [default0]:Skipping sample id=2737297. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2722770. Maximum sequence length: 2049, sample length: 4393 [default0]:Skipping sample id=2741050. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2482754. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2487683. Maximum sequence length: 2049, sample length: 2470 [default0]:Skipping sample id=2742534. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2711489. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2725207. Maximum sequence length: 2049, sample length: 3128 [default0]:Skipping sample id=2713163. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2719974. Maximum sequence length: 2049, sample length: 3822 [default0]:Skipping sample id=2711469. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2721883. Maximum sequence length: 2049, sample length: 6067 [default0]:Skipping sample id=2727385. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2732979. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2488459. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2720439. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2739557. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2735314. Maximum sequence length: 2049, sample length: 5225 [default0]:Skipping sample id=2713994. Maximum sequence length: 2049, sample length: 8506 [default0]:Skipping sample id=2741634. Maximum sequence length: 2049, sample length: 2768 [default0]:Skipping sample id=2725548. Maximum sequence length: 2049, sample length: 2756 [default0]:Skipping sample id=2735819. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2725221. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2723402. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2716325. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2713785. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2731731. Maximum sequence length: 2049, sample length: 7319 [default0]:Skipping sample id=2747788. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2731472. Maximum sequence length: 2049, sample length: 3191 [default0]:Skipping sample id=2742507. Maximum sequence length: 2049, sample length: 4308 [default0]:Skipping sample id=2755500. Maximum sequence length: 2049, sample length: 3570 [default0]:Skipping sample id=2712157. Maximum sequence length: 2049, sample length: 3875 [default0]:Skipping sample id=2726328. Maximum sequence length: 2049, sample length: 3273 [default0]:Skipping sample id=2478126. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2725690. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2743955. Maximum sequence length: 2049, sample length: 4251 [default0]:Skipping sample id=2740141. Maximum sequence length: 2049, sample length: 3267 [default0]:Skipping sample id=2728299. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2734125. Maximum sequence length: 2049, sample length: 3345 [default0]:Skipping sample id=2756245. Maximum sequence length: 2049, sample length: 3577 [default0]:Skipping sample id=2734898. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2732665. Maximum sequence length: 2049, sample length: 5281 [default0]:Skipping sample id=2756913. Maximum sequence length: 2049, sample length: 2369 [default0]:Skipping sample id=2736667. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2726245. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2724389. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2745992. Maximum sequence length: 2049, sample length: 2489 [default0]:Skipping sample id=2711399. Maximum sequence length: 2049, sample length: 3286 [default0]:Skipping sample id=2727068. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2741865. Maximum sequence length: 2049, sample length: 3162 [default0]:Skipping sample id=2726696. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2719550. Maximum sequence length: 2049, sample length: 3589 [default0]:Skipping sample id=2737576. Maximum sequence length: 2049, sample length: 3644 [default0]:Skipping sample id=2470379. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2729558. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2729863. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2740705. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2734559. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2729779. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2748190. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2735995. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2753437. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2484364. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2483936. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2494638. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2729259. Maximum sequence length: 2049, sample length: 3615 [default0]:Skipping sample id=2738928. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2726097. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2732375. Maximum sequence length: 2049, sample length: 2987 [default0]:Skipping sample id=2483532. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2755050. Maximum sequence length: 2049, sample length: 3068 [default0]:Skipping sample id=2745225. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2750642. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2486074. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2713171. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2731078. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2754914. Maximum sequence length: 2049, sample length: 4564 [default0]:Skipping sample id=2751649. Maximum sequence length: 2049, sample length: 2981 [default0]:Skipping sample id=2486859. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2712918. Maximum sequence length: 2049, sample length: 4934 [default0]:Skipping sample id=2757107. Maximum sequence length: 2049, sample length: 3068 [default0]:Skipping sample id=2478162. Maximum sequence length: 2049, sample length: 3631 [default0]:Skipping sample id=2479832. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2755452. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2738622. Maximum sequence length: 2049, sample length: 3168 [default0]:Skipping sample id=2731918. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2748664. Maximum sequence length: 2049, sample length: 3576 [default0]:Skipping sample id=2722182. Maximum sequence length: 2049, sample length: 3282 [default0]:Skipping sample id=2728231. Maximum sequence length: 2049, sample length: 2998 [default0]:Skipping sample id=2490438. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2724169. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2727534. Maximum sequence length: 2049, sample length: 3599 [default0]:Skipping sample id=2755642. Maximum sequence length: 2049, sample length: 3542 [default0]:Skipping sample id=2713333. Maximum sequence length: 2049, sample length: 3981 [default0]:Skipping sample id=2711899. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2752994. Maximum sequence length: 2049, sample length: 3161 [default0]:Skipping sample id=2747230. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2718041. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2746552. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2726206. Maximum sequence length: 2049, sample length: 2576 [default0]:Skipping sample id=2494211. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2726803. Maximum sequence length: 2049, sample length: 3775 [default0]:Skipping sample id=2735925. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2730713. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2739724. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2725778. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2721253. Maximum sequence length: 2049, sample length: 3281 [default0]:Skipping sample id=2724534. Maximum sequence length: 2049, sample length: 3275 [default0]:Skipping sample id=2733605. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2749862. Maximum sequence length: 2049, sample length: 3053 [default0]:Skipping sample id=2729326. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2752762. Maximum sequence length: 2049, sample length: 5085 [default0]:Skipping sample id=2750164. Maximum sequence length: 2049, sample length: 2980 [default0]:Skipping sample id=2720272. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2718908. Maximum sequence length: 2049, sample length: 5121 [default0]:Skipping sample id=2496514. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2757027. Maximum sequence length: 2049, sample length: 3112 [default0]:Skipping sample id=2722553. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2749469. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2739303. Maximum sequence length: 2049, sample length: 3383 [default0]:Skipping sample id=2467304. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2731951. Maximum sequence length: 2049, sample length: 3814 [default0]:Skipping sample id=2734654. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2722596. Maximum sequence length: 2049, sample length: 5754 [default0]:Skipping sample id=2732521. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2747162. Maximum sequence length: 2049, sample length: 4364 [default0]:Skipping sample id=2488030. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2496642. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2737859. Maximum sequence length: 2049, sample length: 3071 [default0]:Skipping sample id=2748724. Maximum sequence length: 2049, sample length: 3901 [default0]:Skipping sample id=2726471. Maximum sequence length: 2049, sample length: 4059 [default0]:Skipping sample id=2730399. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2750187. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2714915. Maximum sequence length: 2049, sample length: 2708 [default0]:Skipping sample id=2496690. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2746519. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2738890. Maximum sequence length: 2049, sample length: 3468 [default0]:Skipping sample id=2746166. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2744428. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2716901. Maximum sequence length: 2049, sample length: 3844 [default0]:Skipping sample id=2737910. Maximum sequence length: 2049, sample length: 3361 [default0]:Skipping sample id=2753146. Maximum sequence length: 2049, sample length: 5321 [default0]:Skipping sample id=2753991. Maximum sequence length: 2049, sample length: 3600 [default0]:Skipping sample id=2743929. Maximum sequence length: 2049, sample length: 6967 [default0]:Skipping sample id=2724122. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2722985. Maximum sequence length: 2049, sample length: 2868 [default0]:Skipping sample id=2738135. Maximum sequence length: 2049, sample length: 3350 [default0]:Skipping sample id=2745254. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2718943. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2728827. Maximum sequence length: 2049, sample length: 2766 [default0]:Skipping sample id=2741034. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2719303. Maximum sequence length: 2049, sample length: 3515 [default0]:Skipping sample id=2747017. Maximum sequence length: 2049, sample length: 3302 [default0]:Skipping sample id=2722807. Maximum sequence length: 2049, sample length: 4771 [default0]:Skipping sample id=2711471. Maximum sequence length: 2049, sample length: 2709 [default0]:Skipping sample id=2713535. Maximum sequence length: 2049, sample length: 4168 [default0]:Skipping sample id=2722473. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2713929. Maximum sequence length: 2049, sample length: 3175 [default0]:Skipping sample id=2721747. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2727151. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2739837. Maximum sequence length: 2049, sample length: 2369 [default0]:Skipping sample id=2491667. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2465788. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2488679. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2751409. Maximum sequence length: 2049, sample length: 2984 [default0]:Skipping sample id=2734249. Maximum sequence length: 2049, sample length: 3405 [default0]:Skipping sample id=2713196. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2469870. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2715690. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2712904. Maximum sequence length: 2049, sample length: 4133 [default0]:Skipping sample id=2746750. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2481543. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2745678. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2741757. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2740470. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2720506. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2483001. Maximum sequence length: 2049, sample length: 2120 [default0]:Skipping sample id=2745800. Maximum sequence length: 2049, sample length: 3430 [default0]:Skipping sample id=2727636. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2732974. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2754546. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2470818. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2478820. Maximum sequence length: 2049, sample length: 3512 [default0]:Skipping sample id=2732728. Maximum sequence length: 2049, sample length: 2868 [default0]:Skipping sample id=2483771. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2738050. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2746561. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2467948. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2713688. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2745242. Maximum sequence length: 2049, sample length: 3563 [default0]:Skipping sample id=2755799. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2750397. Maximum sequence length: 2049, sample length: 4046 [default0]:Skipping sample id=2734639. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2716968. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2754949. Maximum sequence length: 2049, sample length: 3898 [default0]:Skipping sample id=2733783. Maximum sequence length: 2049, sample length: 4950 [default0]:Skipping sample id=2713360. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2757119. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2490494. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2470671. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2726856. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2754970. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2754509. Maximum sequence length: 2049, sample length: 3811 [default0]:Skipping sample id=2477594. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2489039. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2732900. Maximum sequence length: 2049, sample length: 3755 [default0]:Skipping sample id=2741574. Maximum sequence length: 2049, sample length: 4864 [default0]:Skipping sample id=2744724. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2742903. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2712919. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2736956. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2714511. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2718898. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2742073. Maximum sequence length: 2049, sample length: 2879 [default0]:Skipping sample id=2730257. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2494455. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2739029. Maximum sequence length: 2049, sample length: 4401 [default0]:Skipping sample id=2723118. Maximum sequence length: 2049, sample length: 2624 [default0]:Skipping sample id=2739180. Maximum sequence length: 2049, sample length: 4114 [default0]:Skipping sample id=2713318. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2483836. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2499388. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2715560. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2728409. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2741362. Maximum sequence length: 2049, sample length: 3472 [default0]:Skipping sample id=2739217. Maximum sequence length: 2049, sample length: 3313 [default0]:Skipping sample id=2720274. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2726792. Maximum sequence length: 2049, sample length: 6063 [default0]:Skipping sample id=2743898. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2735581. Maximum sequence length: 2049, sample length: 3959 [default0]:Skipping sample id=2738324. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2751454. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2736905. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2717415. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2734317. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2735568. Maximum sequence length: 2049, sample length: 3587 [default0]:Skipping sample id=2727216. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2712831. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2723593. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2748372. Maximum sequence length: 2049, sample length: 4627 [default0]:Skipping sample id=2728945. Maximum sequence length: 2049, sample length: 2944 [default0]:Skipping sample id=2738871. Maximum sequence length: 2049, sample length: 2434 [default0]:Skipping sample id=2713428. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2729881. Maximum sequence length: 2049, sample length: 4143 [default0]:Skipping sample id=2746077. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2479553. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2756860. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2750590. Maximum sequence length: 2049, sample length: 3123 [default0]:Skipping sample id=2494604. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2722600. Maximum sequence length: 2049, sample length: 2796 [default0]:Skipping sample id=2491282. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2753024. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2711073. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2736808. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2711516. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2482534. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2737024. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2737251. Maximum sequence length: 2049, sample length: 3247 [default0]:Skipping sample id=2733619. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2755956. Maximum sequence length: 2049, sample length: 4536 [default0]:Skipping sample id=2741723. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2712236. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2728906. Maximum sequence length: 2049, sample length: 3986 [default0]:Skipping sample id=2481749. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2729848. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2724545. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2735373. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2480517. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2751072. Maximum sequence length: 2049, sample length: 3501 [default0]:Skipping sample id=2743702. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2732218. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2746901. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2712124. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2496862. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2749268. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2735780. Maximum sequence length: 2049, sample length: 3134 [default0]:Skipping sample id=2740766. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2715926. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2741738. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2749140. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2721089. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2757046. Maximum sequence length: 2049, sample length: 3881 [default0]:Skipping sample id=2753384. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2722157. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2754305. Maximum sequence length: 2049, sample length: 3910 [default0]:Skipping sample id=2496806. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2720665. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2742798. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2741391. Maximum sequence length: 2049, sample length: 2972 [default0]:Skipping sample id=2743480. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2717298. Maximum sequence length: 2049, sample length: 3468 [default0]:Skipping sample id=2731132. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2741222. Maximum sequence length: 2049, sample length: 2455 [default0]:Skipping sample id=2745214. Maximum sequence length: 2049, sample length: 6666 [default0]:Skipping sample id=2466976. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2746989. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2738445. Maximum sequence length: 2049, sample length: 3085 [default0]:Skipping sample id=2710992. Maximum sequence length: 2049, sample length: 2841 [default0]:Skipping sample id=2716290. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2741509. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2466938. Maximum sequence length: 2049, sample length: 3391 [default0]:Skipping sample id=2721731. Maximum sequence length: 2049, sample length: 2440 [default0]:Skipping sample id=2749906. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2724784. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2739595. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2735273. Maximum sequence length: 2049, sample length: 3819 [default0]:Skipping sample id=2721514. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2738786. Maximum sequence length: 2049, sample length: 2852 [default0]:Skipping sample id=2484872. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2724252. Maximum sequence length: 2049, sample length: 8513 [default0]:Skipping sample id=2726386. Maximum sequence length: 2049, sample length: 4361 [default0]:Skipping sample id=2734477. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2496060. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2726136. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2750285. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2752761. Maximum sequence length: 2049, sample length: 3431 [default0]:Skipping sample id=2487531. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2730909. Maximum sequence length: 2049, sample length: 4135 [default0]:Skipping sample id=2487908. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2497507. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2742161. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2730984. Maximum sequence length: 2049, sample length: 5247 [default0]:Skipping sample id=2746482. Maximum sequence length: 2049, sample length: 2977 [default0]:Skipping sample id=2730395. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2712595. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2497415. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2497495. Maximum sequence length: 2049, sample length: 2717 [default0]:Skipping sample id=2723245. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2749568. Maximum sequence length: 2049, sample length: 3553 [default0]:Skipping sample id=2718656. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2753027. Maximum sequence length: 2049, sample length: 3611 [default0]:Skipping sample id=2735386. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2742412. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2722272. Maximum sequence length: 2049, sample length: 3454 [default0]:Skipping sample id=2712151. Maximum sequence length: 2049, sample length: 5070 [default0]:Skipping sample id=2730966. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2470823. Maximum sequence length: 2049, sample length: 2717 [default0]:Skipping sample id=2499152. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2755830. Maximum sequence length: 2049, sample length: 5063 [default0]:Skipping sample id=2718210. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2479657. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2713076. Maximum sequence length: 2049, sample length: 4868 [default0]:Skipping sample id=2720980. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2711832. Maximum sequence length: 2049, sample length: 5644 [default0]:Skipping sample id=2752893. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2744269. Maximum sequence length: 2049, sample length: 3540 [default0]:Skipping sample id=2493856. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2716077. Maximum sequence length: 2049, sample length: 2748 [default0]:Skipping sample id=2728699. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2729275. Maximum sequence length: 2049, sample length: 2567 [default0]:Skipping sample id=2748498. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2747308. Maximum sequence length: 2049, sample length: 5837 [default0]:Skipping sample id=2738882. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2713514. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2755782. Maximum sequence length: 2049, sample length: 3645 [default0]:Skipping sample id=2726637. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2750595. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2750777. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2739209. Maximum sequence length: 2049, sample length: 3970 [default0]:Skipping sample id=2715271. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2723848. Maximum sequence length: 2049, sample length: 3987 [default0]:Skipping sample id=2727704. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2718551. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2753127. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2742097. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2727792. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2470222. Maximum sequence length: 2049, sample length: 2781 [default0]:Skipping sample id=2727954. Maximum sequence length: 2049, sample length: 3459 [default0]:Skipping sample id=2734941. Maximum sequence length: 2049, sample length: 2966 [default0]:Skipping sample id=2497531. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2738181. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2732396. Maximum sequence length: 2049, sample length: 3688 [default0]:Skipping sample id=2724143. Maximum sequence length: 2049, sample length: 4159 [default0]:Skipping sample id=2718006. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2741873. Maximum sequence length: 2049, sample length: 2898 [default0]:Skipping sample id=2716585. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2747163. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2738862. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2719321. Maximum sequence length: 2049, sample length: 2562 [default0]:Skipping sample id=2749503. Maximum sequence length: 2049, sample length: 4554 [default0]:Skipping sample id=2726743. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2752123. Maximum sequence length: 2049, sample length: 3168 [default0]:Skipping sample id=2711006. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2727725. Maximum sequence length: 2049, sample length: 5737 [default0]:Skipping sample id=2751650. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2711323. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2746090. Maximum sequence length: 2049, sample length: 4003 [default0]:Skipping sample id=2747022. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2730067. Maximum sequence length: 2049, sample length: 3995 [default0]:Skipping sample id=2750971. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2721073. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2470414. Maximum sequence length: 2049, sample length: 3034 [default0]:Skipping sample id=2738209. Maximum sequence length: 2049, sample length: 3009 [default0]:Skipping sample id=2723484. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2727348. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2739765. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2750272. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2749031. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2713625. Maximum sequence length: 2049, sample length: 3905 [default0]:Skipping sample id=2753277. Maximum sequence length: 2049, sample length: 3391 [default0]:Skipping sample id=2713693. Maximum sequence length: 2049, sample length: 4981 [default0]:Skipping sample id=2743368. Maximum sequence length: 2049, sample length: 2781 [default0]:Skipping sample id=2729120. Maximum sequence length: 2049, sample length: 3405 [default0]:Skipping sample id=2722220. Maximum sequence length: 2049, sample length: 3425 [default0]:Skipping sample id=2715972. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2744397. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2734465. Maximum sequence length: 2049, sample length: 6007 [default0]:Skipping sample id=2752905. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2757011. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2722338. Maximum sequence length: 2049, sample length: 3245 [default0]:Skipping sample id=2720983. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2719911. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2481493. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2728109. Maximum sequence length: 2049, sample length: 3313 [default0]:Skipping sample id=2753204. Maximum sequence length: 2049, sample length: 2963 [default0]:Skipping sample id=2467785. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2745423. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2742579. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2714418. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2723049. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2722921. Maximum sequence length: 2049, sample length: 3837 [default0]:Skipping sample id=2754245. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2747619. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2478013. Maximum sequence length: 2049, sample length: 3261 [default0]:Skipping sample id=2740929. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2722853. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2730252. Maximum sequence length: 2049, sample length: 3870 [default0]:Skipping sample id=2722688. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2484134. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2724629. Maximum sequence length: 2049, sample length: 2947 [default0]:Skipping sample id=2715357. Maximum sequence length: 2049, sample length: 3440 [default0]:Skipping sample id=2747841. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2726177. Maximum sequence length: 2049, sample length: 2957 [default0]:Skipping sample id=2739688. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2754724. Maximum sequence length: 2049, sample length: 4926 [default0]:Skipping sample id=2496818. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2487370. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2729821. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2749318. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2722816. Maximum sequence length: 2049, sample length: 4341 [default0]:Skipping sample id=2723344. Maximum sequence length: 2049, sample length: 3688 [default0]:Skipping sample id=2711060. Maximum sequence length: 2049, sample length: 4823 [default0]:Skipping sample id=2723595. Maximum sequence length: 2049, sample length: 5040 [default0]:Skipping sample id=2753826. Maximum sequence length: 2049, sample length: 4281 [default0]:Skipping sample id=2728291. Maximum sequence length: 2049, sample length: 4123 [default0]:Skipping sample id=2722542. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2711565. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2742614. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2722977. Maximum sequence length: 2049, sample length: 2755 [default0]:Skipping sample id=2470755. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2491748. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2728933. Maximum sequence length: 2049, sample length: 3027 [default0]:Skipping sample id=2729928. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2734914. Maximum sequence length: 2049, sample length: 3947 [default0]:Skipping sample id=2492307. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2725294. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2488173. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2743103. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2478631. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2725966. Maximum sequence length: 2049, sample length: 3146 [default0]:Skipping sample id=2728093. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2726880. Maximum sequence length: 2049, sample length: 4268 [default0]:Skipping sample id=2498825. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2731935. Maximum sequence length: 2049, sample length: 6674 [default0]:Skipping sample id=2723053. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2756811. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2492481. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2730536. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2714100. Maximum sequence length: 2049, sample length: 3887 [default0]:Skipping sample id=2715595. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2481702. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2726405. Maximum sequence length: 2049, sample length: 5003 [default0]:Skipping sample id=2732546. Maximum sequence length: 2049, sample length: 14247 [default0]:Skipping sample id=2712915. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2740006. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2750396. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2727004. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2481828. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2732073. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2724052. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2730797. Maximum sequence length: 2049, sample length: 2958 [default0]:Skipping sample id=2479275. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2727538. Maximum sequence length: 2049, sample length: 5060 [default0]:Skipping sample id=2483334. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2742854. Maximum sequence length: 2049, sample length: 3081 [default0]:Skipping sample id=2726125. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2739447. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2720248. Maximum sequence length: 2049, sample length: 5610 [default0]:Skipping sample id=2731580. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2754943. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2756474. Maximum sequence length: 2049, sample length: 3426 [default0]:Skipping sample id=2735393. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2741353. Maximum sequence length: 2049, sample length: 2977 [default0]:Skipping sample id=2752323. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2723750. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2742525. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2736480. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2723079. Maximum sequence length: 2049, sample length: 4021 [default0]:Skipping sample id=2733255. Maximum sequence length: 2049, sample length: 6292 [default0]:Skipping sample id=2729923. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2717603. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2478778. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2723323. Maximum sequence length: 2049, sample length: 2899 [default0]:Skipping sample id=2717964. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2752273. Maximum sequence length: 2049, sample length: 3068 [default0]:Skipping sample id=2737449. Maximum sequence length: 2049, sample length: 4032 [default0]:Skipping sample id=2752470. Maximum sequence length: 2049, sample length: 3355 [default0]:Skipping sample id=2754689. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2754906. Maximum sequence length: 2049, sample length: 4449 [default0]:Skipping sample id=2492441. Maximum sequence length: 2049, sample length: 2679 [default0]:Skipping sample id=2729359. Maximum sequence length: 2049, sample length: 3772 [default0]:Skipping sample id=2747811. Maximum sequence length: 2049, sample length: 4375 [default0]:Skipping sample id=2732533. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2745188. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2755929. Maximum sequence length: 2049, sample length: 3921 [default0]:Skipping sample id=2745362. Maximum sequence length: 2049, sample length: 3101 [default0]:Skipping sample id=2744779. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2494776. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2728378. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2750607. Maximum sequence length: 2049, sample length: 3746 [default0]:Skipping sample id=2746105. Maximum sequence length: 2049, sample length: 3482 [default0]:Skipping sample id=2733438. Maximum sequence length: 2049, sample length: 4173 [default0]:Skipping sample id=2729151. Maximum sequence length: 2049, sample length: 5756 [default0]:Skipping sample id=2722102. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2741817. Maximum sequence length: 2049, sample length: 4124 [default0]:Skipping sample id=2717484. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2482091. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2722708. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2747322. Maximum sequence length: 2049, sample length: 6530 [default0]:Skipping sample id=2728830. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2720298. Maximum sequence length: 2049, sample length: 3622 [default0]:Skipping sample id=2470706. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2482443. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2739969. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2748552. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2492278. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2737262. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2755995. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2717718. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2723784. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2747419. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2716008. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2718248. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2729961. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2742898. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2756886. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2725754. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2719609. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2730730. Maximum sequence length: 2049, sample length: 3345 [default0]:Skipping sample id=2744814. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2466458. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2737022. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2757098. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2733435. Maximum sequence length: 2049, sample length: 3820 [default0]:Skipping sample id=2719558. Maximum sequence length: 2049, sample length: 6465 [default0]:Skipping sample id=2494093. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2714040. Maximum sequence length: 2049, sample length: 3715 [default0]:Skipping sample id=2719646. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2747034. Maximum sequence length: 2049, sample length: 4678 [default0]:Skipping sample id=2739664. Maximum sequence length: 2049, sample length: 3754 [default0]:Skipping sample id=2483295. Maximum sequence length: 2049, sample length: 2850 [default0]:Skipping sample id=2478642. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2745154. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2724378. Maximum sequence length: 2049, sample length: 3342 [default0]:Skipping sample id=2482737. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2486812. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2753379. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2494022. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2723401. Maximum sequence length: 2049, sample length: 2902 [default0]:Skipping sample id=2753387. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2746491. Maximum sequence length: 2049, sample length: 3472 [default0]:Skipping sample id=2723312. Maximum sequence length: 2049, sample length: 3305 [default0]:Skipping sample id=2742503. Maximum sequence length: 2049, sample length: 3425 [default0]:Skipping sample id=2712320. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2720489. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2717157. Maximum sequence length: 2049, sample length: 3171 [default0]:Skipping sample id=2470464. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2736318. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2734140. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2724432. Maximum sequence length: 2049, sample length: 3522 [default0]:Skipping sample id=2465930. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2498486. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2489370. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2734919. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2744684. Maximum sequence length: 2049, sample length: 4000 [default0]:Skipping sample id=2716211. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2498731. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2748123. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2481514. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2487686. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2490739. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2726455. Maximum sequence length: 2049, sample length: 4031 [default0]:Skipping sample id=2731330. Maximum sequence length: 2049, sample length: 6514 [default0]:Skipping sample id=2470250. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2720521. Maximum sequence length: 2049, sample length: 3467 [default0]:Skipping sample id=2712197. Maximum sequence length: 2049, sample length: 4155 [default0]:Skipping sample id=2711377. Maximum sequence length: 2049, sample length: 3491 [default0]:Skipping sample id=2714006. Maximum sequence length: 2049, sample length: 4148 [default0]:Skipping sample id=2732528. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2723019. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2726829. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2721405. Maximum sequence length: 2049, sample length: 3938 [default0]:Skipping sample id=2756796. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2718243. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2736117. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2720773. Maximum sequence length: 2049, sample length: 3315 [default0]:Skipping sample id=2466347. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2740510. Maximum sequence length: 2049, sample length: 3679 [default0]:Skipping sample id=2721774. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2480430. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2487328. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2753420. Maximum sequence length: 2049, sample length: 3944 [default0]:Skipping sample id=2479289. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2743116. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2485551. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2712548. Maximum sequence length: 2049, sample length: 2434 [default0]:Skipping sample id=2718564. Maximum sequence length: 2049, sample length: 3204 [default0]:Skipping sample id=2732989. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2714255. Maximum sequence length: 2049, sample length: 4301 [default0]:Skipping sample id=2717152. Maximum sequence length: 2049, sample length: 3277 [default0]:Skipping sample id=2735079. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2487581. Maximum sequence length: 2049, sample length: 2584 [default0]:Skipping sample id=2732708. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2750434. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2729812. Maximum sequence length: 2049, sample length: 2966 [default0]:Skipping sample id=2715251. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2756347. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2713613. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2471282. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2746975. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2733246. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2754598. Maximum sequence length: 2049, sample length: 2935 [default0]:Skipping sample id=2753244. Maximum sequence length: 2049, sample length: 2810 [default0]:Skipping sample id=2752110. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2740044. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2754447. Maximum sequence length: 2049, sample length: 3136 [default0]:Skipping sample id=2751983. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2724632. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2737962. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2736565. Maximum sequence length: 2049, sample length: 2907 [default0]:Skipping sample id=2721128. Maximum sequence length: 2049, sample length: 3643 [default0]:Skipping sample id=2481997. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2750866. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2487450. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2487982. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2740294. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2748088. Maximum sequence length: 2049, sample length: 3158 [default0]:Skipping sample id=2486294. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2723993. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2752640. Maximum sequence length: 2049, sample length: 3306 [default0]:Skipping sample id=2724027. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2730350. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2752251. Maximum sequence length: 2049, sample length: 2486 [default0]:Skipping sample id=2711442. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2743944. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2718151. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2712108. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2715506. Maximum sequence length: 2049, sample length: 3162 [default0]:Skipping sample id=2723475. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2748305. Maximum sequence length: 2049, sample length: 2871 [default0]:Skipping sample id=2736976. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2719222. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2725337. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2726439. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2711036. Maximum sequence length: 2049, sample length: 2973 [default0]:Skipping sample id=2468841. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2723391. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2481884. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2488695. Maximum sequence length: 2049, sample length: 3510 [default0]:Skipping sample id=2491217. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2751248. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2727536. Maximum sequence length: 2049, sample length: 5281 [default0]:Skipping sample id=2727445. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2717635. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2471091. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2724783. Maximum sequence length: 2049, sample length: 3889 [default0]:Skipping sample id=2748168. Maximum sequence length: 2049, sample length: 3370 [default0]:Skipping sample id=2723174. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2756980. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2711165. Maximum sequence length: 2049, sample length: 5826 [default0]:Skipping sample id=2716400. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2719987. Maximum sequence length: 2049, sample length: 3098 [default0]:Skipping sample id=2724117. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2720448. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2727279. Maximum sequence length: 2049, sample length: 3089 [default0]:Skipping sample id=2738464. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2725987. Maximum sequence length: 2049, sample length: 2639 [default0]:Skipping sample id=2493034. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2483841. Maximum sequence length: 2049, sample length: 2864 [default0]:Skipping sample id=2712758. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2723538. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2716423. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2730363. Maximum sequence length: 2049, sample length: 4625 [default0]:Skipping sample id=2735459. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2721076. Maximum sequence length: 2049, sample length: 4313 [default0]:Skipping sample id=2755897. Maximum sequence length: 2049, sample length: 3686 [default0]:Skipping sample id=2718502. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2745555. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2728287. Maximum sequence length: 2049, sample length: 2456 [default0]:Skipping sample id=2743458. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2714869. Maximum sequence length: 2049, sample length: 3675 [default0]:Skipping sample id=2740671. Maximum sequence length: 2049, sample length: 3429 [default0]:Skipping sample id=2740713. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2742661. Maximum sequence length: 2049, sample length: 2521 [default0]:Skipping sample id=2724777. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2726409. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2741086. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2731642. Maximum sequence length: 2049, sample length: 5927 [default0]:Skipping sample id=2721495. Maximum sequence length: 2049, sample length: 3304 [default0]:Skipping sample id=2740803. Maximum sequence length: 2049, sample length: 5958 [default0]:Skipping sample id=2724325. Maximum sequence length: 2049, sample length: 2755 [default0]:Skipping sample id=2735635. Maximum sequence length: 2049, sample length: 5600 [default0]:Skipping sample id=2737064. Maximum sequence length: 2049, sample length: 6522 [default0]:Skipping sample id=2711706. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2742806. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2754424. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2729409. Maximum sequence length: 2049, sample length: 3596 [default0]:Skipping sample id=2734809. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2753467. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2718044. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2713363. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2733644. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2720002. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2755890. Maximum sequence length: 2049, sample length: 2923 [default0]:Skipping sample id=2734509. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2754645. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2727307. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2734305. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2737591. Maximum sequence length: 2049, sample length: 4688 [default0]:Skipping sample id=2751611. Maximum sequence length: 2049, sample length: 2985 [default0]:Skipping sample id=2738259. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2470719. Maximum sequence length: 2049, sample length: 2896 [default0]:Skipping sample id=2711526. Maximum sequence length: 2049, sample length: 4683 [default0]:Skipping sample id=2754529. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2733022. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2723724. Maximum sequence length: 2049, sample length: 3649 [default0]:Skipping sample id=2738918. Maximum sequence length: 2049, sample length: 3904 [default0]:Skipping sample id=2715851. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2732764. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2718398. Maximum sequence length: 2049, sample length: 3481 [default0]:Skipping sample id=2729309. Maximum sequence length: 2049, sample length: 2507 [default0]:Skipping sample id=2716888. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2736978. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2726899. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2733420. Maximum sequence length: 2049, sample length: 4123 [default0]:Skipping sample id=2737460. Maximum sequence length: 2049, sample length: 2470 [default0]:Skipping sample id=2737653. Maximum sequence length: 2049, sample length: 4435 [default0]:Skipping sample id=2492315. Maximum sequence length: 2049, sample length: 4266 [default0]:Skipping sample id=2743201. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2725818. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2477489. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2724084. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2746216. Maximum sequence length: 2049, sample length: 3421 [default0]:Skipping sample id=2489378. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2757110. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2719014. Maximum sequence length: 2049, sample length: 4152 [default0]:Skipping sample id=2720618. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2739543. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2712058. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2721174. Maximum sequence length: 2049, sample length: 4827 [default0]:Skipping sample id=2466496. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2721905. Maximum sequence length: 2049, sample length: 2939 [default0]:Skipping sample id=2756451. Maximum sequence length: 2049, sample length: 2757 [default0]:Skipping sample id=2716360. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2477084. Maximum sequence length: 2049, sample length: 3590 [default0]:Skipping sample id=2741006. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2737360. Maximum sequence length: 2049, sample length: 3109 [default0]:Skipping sample id=2725775. Maximum sequence length: 2049, sample length: 5860 [default0]:Skipping sample id=2748311. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2720336. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2748798. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2725145. Maximum sequence length: 2049, sample length: 2904 [default0]:Skipping sample id=2714538. Maximum sequence length: 2049, sample length: 4775 [default0]:Skipping sample id=2741242. Maximum sequence length: 2049, sample length: 5159 [default0]:Skipping sample id=2742609. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2723763. Maximum sequence length: 2049, sample length: 3277 [default0]:Skipping sample id=2751463. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2721823. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2712185. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2727814. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2711962. Maximum sequence length: 2049, sample length: 3418 [default0]:Skipping sample id=2750034. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2754686. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2713950. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2732364. Maximum sequence length: 2049, sample length: 4371 [default0]:Skipping sample id=2749625. Maximum sequence length: 2049, sample length: 4447 [default0]:Skipping sample id=2751705. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2466073. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2724848. Maximum sequence length: 2049, sample length: 3457 [default0]:Skipping sample id=2714680. Maximum sequence length: 2049, sample length: 4005 [default0]:Skipping sample id=2711313. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2715484. Maximum sequence length: 2049, sample length: 5393 [default0]:Skipping sample id=2742245. Maximum sequence length: 2049, sample length: 2644 [default0]:Skipping sample id=2467794. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2484807. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2724301. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2756947. Maximum sequence length: 2049, sample length: 3832 [default0]:Skipping sample id=2487395. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2750354. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2713673. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2747969. Maximum sequence length: 2049, sample length: 6263 [default0]:Skipping sample id=2716294. Maximum sequence length: 2049, sample length: 5826 [default0]:Skipping sample id=2722073. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2739168. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2713708. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2728334. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2740346. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2736586. Maximum sequence length: 2049, sample length: 3256 [default0]:Skipping sample id=2735422. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2488579. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2481074. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2730041. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2713415. Maximum sequence length: 2049, sample length: 3235 [default0]:Skipping sample id=2733659. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2726447. Maximum sequence length: 2049, sample length: 4813 [default0]:Skipping sample id=2723865. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2750834. Maximum sequence length: 2049, sample length: 3268 [default0]:Skipping sample id=2489781. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2722592. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2724683. Maximum sequence length: 2049, sample length: 2983 [default0]:Skipping sample id=2749078. Maximum sequence length: 2049, sample length: 5543 [default0]:Skipping sample id=2728206. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2736440. Maximum sequence length: 2049, sample length: 3215 [default0]:Skipping sample id=2491674. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2746163. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2722143. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2746749. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2754187. Maximum sequence length: 2049, sample length: 6973 [default0]:Skipping sample id=2493267. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2725054. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2725295. Maximum sequence length: 2049, sample length: 3047 [default0]:Skipping sample id=2712505. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2729348. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2750557. Maximum sequence length: 2049, sample length: 2500 [default0]:Skipping sample id=2746296. Maximum sequence length: 2049, sample length: 3944 [default0]:Skipping sample id=2721974. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2742779. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2727353. Maximum sequence length: 2049, sample length: 2455 [default0]:Skipping sample id=2724197. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2718628. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2756994. Maximum sequence length: 2049, sample length: 3162 [default0]:Skipping sample id=2736495. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2735492. Maximum sequence length: 2049, sample length: 4625 [default0]:Skipping sample id=2749682. Maximum sequence length: 2049, sample length: 5029 [default0]:Skipping sample id=2716069. Maximum sequence length: 2049, sample length: 5057 [default0]:Skipping sample id=2713621. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2722906. Maximum sequence length: 2049, sample length: 2781 [default0]:Skipping sample id=2729683. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2751411. Maximum sequence length: 2049, sample length: 3042 [default0]:Skipping sample id=2756573. Maximum sequence length: 2049, sample length: 3097 [default0]:Skipping sample id=2742514. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2711788. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2742332. Maximum sequence length: 2049, sample length: 4540 [default0]:Skipping sample id=2479922. Maximum sequence length: 2049, sample length: 2941 [default0]:Skipping sample id=2733733. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2732325. Maximum sequence length: 2049, sample length: 3615 [default0]:Skipping sample id=2749428. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2741185. Maximum sequence length: 2049, sample length: 3889 [default0]:Skipping sample id=2716442. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2748698. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2717792. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2714171. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2749449. Maximum sequence length: 2049, sample length: 2850 [default0]:Skipping sample id=2737109. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2751376. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2712430. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2753793. Maximum sequence length: 2049, sample length: 3574 [default0]:Skipping sample id=2747766. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2717432. Maximum sequence length: 2049, sample length: 6956 [default0]:Skipping sample id=2752804. Maximum sequence length: 2049, sample length: 3752 [default0]:Skipping sample id=2746330. Maximum sequence length: 2049, sample length: 3341 [default0]:Skipping sample id=2495947. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2747718. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2718034. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2714185. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2746780. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2754146. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2747118. Maximum sequence length: 2049, sample length: 7155 [default0]:Skipping sample id=2754433. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2716621. Maximum sequence length: 2049, sample length: 4046 [default0]:Skipping sample id=2720460. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2499185. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2489901. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2742190. Maximum sequence length: 2049, sample length: 3603 [default0]:Skipping sample id=2718994. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2748791. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2743889. Maximum sequence length: 2049, sample length: 3510 [default0]:Skipping sample id=2736624. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2494570. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2739676. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2720668. Maximum sequence length: 2049, sample length: 3759 [default0]:Skipping sample id=2498983. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2722138. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2737396. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2732676. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2715958. Maximum sequence length: 2049, sample length: 3083 [default0]:Skipping sample id=2749249. Maximum sequence length: 2049, sample length: 4227 [default0]:Skipping sample id=2490029. Maximum sequence length: 2049, sample length: 2727 [default0]:Skipping sample id=2746663. Maximum sequence length: 2049, sample length: 3747 [default0]:Skipping sample id=2740281. Maximum sequence length: 2049, sample length: 3673 [default0]:Skipping sample id=2733184. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2740617. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2729825. Maximum sequence length: 2049, sample length: 3792 [default0]:Skipping sample id=2727579. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2739918. Maximum sequence length: 2049, sample length: 3034 [default0]:Skipping sample id=2736567. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2490211. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2485421. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2752259. Maximum sequence length: 2049, sample length: 3743 [default0]:Skipping sample id=2493833. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2741415. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2727250. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2730179. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2478830. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2719123. Maximum sequence length: 2049, sample length: 8506 [default0]:Skipping sample id=2729491. Maximum sequence length: 2049, sample length: 4562 [default0]:Skipping sample id=2731080. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2495894. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2478214. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2744602. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2720649. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2725843. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2754833. Maximum sequence length: 2049, sample length: 3321 [default0]:Skipping sample id=2717424. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2727609. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2752107. Maximum sequence length: 2049, sample length: 4551 [default0]:Skipping sample id=2721970. Maximum sequence length: 2049, sample length: 2557 [default0]:Skipping sample id=2721121. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2728994. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2751420. Maximum sequence length: 2049, sample length: 5630 [default0]:Skipping sample id=2489780. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2491056. Maximum sequence length: 2049, sample length: 3411 [default0]:Skipping sample id=2466499. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2489737. Maximum sequence length: 2049, sample length: 3267 [default0]:Skipping sample id=2752570. Maximum sequence length: 2049, sample length: 3427 [default0]:Skipping sample id=2728834. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2743751. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2733522. Maximum sequence length: 2049, sample length: 5032 [default0]:Skipping sample id=2497940. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2739513. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2745147. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2752998. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2712149. Maximum sequence length: 2049, sample length: 5291 [default0]:Skipping sample id=2727095. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2736934. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2756899. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2748669. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2731950. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2728670. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2744087. Maximum sequence length: 2049, sample length: 3490 [default0]:Skipping sample id=2726358. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2726192. Maximum sequence length: 2049, sample length: 5367 [default0]:Skipping sample id=2718120. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2724246. Maximum sequence length: 2049, sample length: 2771 [default0]:Skipping sample id=2728346. Maximum sequence length: 2049, sample length: 3735 [default0]:Skipping sample id=2728971. Maximum sequence length: 2049, sample length: 3437 [default0]:Skipping sample id=2744475. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2750600. Maximum sequence length: 2049, sample length: 4365 [default0]:Skipping sample id=2468730. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2721856. Maximum sequence length: 2049, sample length: 3486 [default0]:Skipping sample id=2721572. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2712964. Maximum sequence length: 2049, sample length: 3823 [default0]:Skipping sample id=2752386. Maximum sequence length: 2049, sample length: 3659 [default0]:Skipping sample id=2739808. Maximum sequence length: 2049, sample length: 2818 [default0]:Skipping sample id=2738012. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2756473. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2730378. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2746994. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2712860. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2728642. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2748979. Maximum sequence length: 2049, sample length: 4816 [default0]:Skipping sample id=2736069. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2741312. Maximum sequence length: 2049, sample length: 4116 [default0]:Skipping sample id=2746890. Maximum sequence length: 2049, sample length: 4122 [default0]:Skipping sample id=2719812. Maximum sequence length: 2049, sample length: 4209 [default0]:Skipping sample id=2744357. Maximum sequence length: 2049, sample length: 3004 [default0]:Skipping sample id=2734773. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2729138. Maximum sequence length: 2049, sample length: 5678 [default0]:Skipping sample id=2484264. Maximum sequence length: 2049, sample length: 2695 [default0]:Skipping sample id=2743095. Maximum sequence length: 2049, sample length: 6257 [default0]:Skipping sample id=2725847. Maximum sequence length: 2049, sample length: 4743 [default0]:Skipping sample id=2742150. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2733562. Maximum sequence length: 2049, sample length: 4077 [default0]:Skipping sample id=2720242. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2735921. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2756595. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2730666. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2495006. Maximum sequence length: 2049, sample length: 2687 [default0]:Skipping sample id=2756629. Maximum sequence length: 2049, sample length: 5165 [default0]:Skipping sample id=2485728. Maximum sequence length: 2049, sample length: 3241 [default0]:Skipping sample id=2745482. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2713505. Maximum sequence length: 2049, sample length: 3270 [default0]:Skipping sample id=2733853. Maximum sequence length: 2049, sample length: 5023 [default0]:Skipping sample id=2721105. Maximum sequence length: 2049, sample length: 2400 [default0]:Skipping sample id=2715634. Maximum sequence length: 2049, sample length: 4233 [default0]:Skipping sample id=2756278. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2753758. Maximum sequence length: 2049, sample length: 6106 [default0]:Skipping sample id=2733971. Maximum sequence length: 2049, sample length: 5707 [default0]:Skipping sample id=2752810. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2723925. Maximum sequence length: 2049, sample length: 2746 [default0]:Skipping sample id=2746523. Maximum sequence length: 2049, sample length: 3250 [default0]:Skipping sample id=2734887. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2722482. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2734955. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2712932. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2735208. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2752322. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2748867. Maximum sequence length: 2049, sample length: 3612 [default0]:Skipping sample id=2743956. Maximum sequence length: 2049, sample length: 3839 [default0]:Skipping sample id=2730508. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2738026. Maximum sequence length: 2049, sample length: 5173 [default0]:Skipping sample id=2721881. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2744107. Maximum sequence length: 2049, sample length: 3833 [default0]:Skipping sample id=2726287. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2731335. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2720959. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2757024. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2735735. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2735599. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2748862. Maximum sequence length: 2049, sample length: 3603 [default0]:Skipping sample id=2495413. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2498409. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2721620. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2734137. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2731041. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2731178. Maximum sequence length: 2049, sample length: 3215 [default0]:Skipping sample id=2724042. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2745833. Maximum sequence length: 2049, sample length: 4960 [default0]:Skipping sample id=2724249. Maximum sequence length: 2049, sample length: 3439 [default0]:Skipping sample id=2720369. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2711161. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2747305. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2754932. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2713941. Maximum sequence length: 2049, sample length: 3726 [default0]:Skipping sample id=2483073. Maximum sequence length: 2049, sample length: 3520 [default0]:Skipping sample id=2744881. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2710999. Maximum sequence length: 2049, sample length: 4371 [default0]:Skipping sample id=2499450. Maximum sequence length: 2049, sample length: 3118 [default0]:Skipping sample id=2481620. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2713124. Maximum sequence length: 2049, sample length: 4371 [default0]:Skipping sample id=2488741. Maximum sequence length: 2049, sample length: 3241 [default0]:Skipping sample id=2724102. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2734534. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2743960. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2756883. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2714250. Maximum sequence length: 2049, sample length: 2573 [default0]:Skipping sample id=2728226. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2755365. Maximum sequence length: 2049, sample length: 3239 [default0]:Skipping sample id=2740096. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2734766. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2715479. Maximum sequence length: 2049, sample length: 3465 [default0]:Skipping sample id=2734129. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2719187. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2741374. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2720589. Maximum sequence length: 2049, sample length: 4254 [default0]:Skipping sample id=2747192. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2493233. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2740089. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2749867. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2742513. Maximum sequence length: 2049, sample length: 3339 [default0]:Skipping sample id=2748383. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2732008. Maximum sequence length: 2049, sample length: 4036 [default0]:Skipping sample id=2739051. Maximum sequence length: 2049, sample length: 2843 [default0]:Skipping sample id=2711048. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2739120. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2738023. Maximum sequence length: 2049, sample length: 4414 [default0]:Skipping sample id=2732778. Maximum sequence length: 2049, sample length: 4251 [default0]:Skipping sample id=2727611. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2750542. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2731171. Maximum sequence length: 2049, sample length: 4486 [default0]:Skipping sample id=2752530. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2731301. Maximum sequence length: 2049, sample length: 3687 [default0]:Skipping sample id=2732701. Maximum sequence length: 2049, sample length: 3380 [default0]:Skipping sample id=2470774. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2712305. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2492787. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2711302. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2467185. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2715036. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2731082. Maximum sequence length: 2049, sample length: 3950 [default0]:Skipping sample id=2727602. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2747306. Maximum sequence length: 2049, sample length: 4371 [default0]:Skipping sample id=2478967. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2730669. Maximum sequence length: 2049, sample length: 2942 [default0]:Skipping sample id=2737907. Maximum sequence length: 2049, sample length: 4152 [default0]:Skipping sample id=2742922. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2498750. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2493335. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2731849. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2486566. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2488420. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2754142. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2477491. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2724351. Maximum sequence length: 2049, sample length: 3397 [default0]:Skipping sample id=2740651. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2721744. Maximum sequence length: 2049, sample length: 5194 [default0]:Skipping sample id=2744553. Maximum sequence length: 2049, sample length: 5021 [default0]:Skipping sample id=2756907. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2729668. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2750426. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2748298. Maximum sequence length: 2049, sample length: 3047 [default0]:Skipping sample id=2738769. Maximum sequence length: 2049, sample length: 4099 [default0]:Skipping sample id=2743771. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2734838. Maximum sequence length: 2049, sample length: 3067 [default0]:Skipping sample id=2751149. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2734504. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2723950. Maximum sequence length: 2049, sample length: 5132 [default0]:Skipping sample id=2738218. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2497390. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2489713. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2742322. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2734803. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2745930. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2711747. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2739177. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2722967. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2734010. Maximum sequence length: 2049, sample length: 3384 [default0]:Skipping sample id=2717040. Maximum sequence length: 2049, sample length: 4714 [default0]:Skipping sample id=2748272. Maximum sequence length: 2049, sample length: 3092 [default0]:Skipping sample id=2720723. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2736024. Maximum sequence length: 2049, sample length: 3120 [default0]:Skipping sample id=2484568. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2720571. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2755972. Maximum sequence length: 2049, sample length: 4337 [default0]:Skipping sample id=2735557. Maximum sequence length: 2049, sample length: 5221 [default0]:Skipping sample id=2749841. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2734644. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2751947. Maximum sequence length: 2049, sample length: 3345 [default0]:Skipping sample id=2732258. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2728077. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2470206. Maximum sequence length: 2049, sample length: 2925 [default0]:Skipping sample id=2728126. Maximum sequence length: 2049, sample length: 6292 [default0]:Skipping sample id=2487941. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2740809. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2712477. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2716456. Maximum sequence length: 2049, sample length: 2502 [default0]:Skipping sample id=2756585. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2720945. Maximum sequence length: 2049, sample length: 3151 [default0]:Skipping sample id=2723169. Maximum sequence length: 2049, sample length: 5828 [default0]:Skipping sample id=2748465. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2712575. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2715326. Maximum sequence length: 2049, sample length: 2989 [default0]:Skipping sample id=2731830. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2724850. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2747977. Maximum sequence length: 2049, sample length: 3618 [default0]:Skipping sample id=2477146. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2743842. Maximum sequence length: 2049, sample length: 3822 [default0]:Skipping sample id=2487645. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2721833. Maximum sequence length: 2049, sample length: 2983 [default0]:Skipping sample id=2725190. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2469831. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2723492. Maximum sequence length: 2049, sample length: 4872 [default0]:Skipping sample id=2714906. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2748971. Maximum sequence length: 2049, sample length: 3828 [default0]:Skipping sample id=2757102. Maximum sequence length: 2049, sample length: 3229 [default0]:Skipping sample id=2722417. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2713213. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2724751. Maximum sequence length: 2049, sample length: 2862 [default0]:Skipping sample id=2750589. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2480583. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2719078. Maximum sequence length: 2049, sample length: 6017 [default0]:Skipping sample id=2724275. Maximum sequence length: 2049, sample length: 3271 [default0]:Skipping sample id=2754579. Maximum sequence length: 2049, sample length: 3180 [default0]:Skipping sample id=2714658. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2738489. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2729181. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2731503. Maximum sequence length: 2049, sample length: 4310 [default0]:Skipping sample id=2741810. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2754996. Maximum sequence length: 2049, sample length: 2457 [default0]:Skipping sample id=2714708. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2755816. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2493497. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2736727. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2714876. Maximum sequence length: 2049, sample length: 4322 [default0]:Skipping sample id=2486899. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2756211. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2726945. Maximum sequence length: 2049, sample length: 4708 [default0]:Skipping sample id=2718681. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2728705. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2484703. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2736110. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2749170. Maximum sequence length: 2049, sample length: 4778 [default0]:Skipping sample id=2734154. Maximum sequence length: 2049, sample length: 2963 [default0]:Skipping sample id=2753982. Maximum sequence length: 2049, sample length: 5276 [default0]:Skipping sample id=2720608. Maximum sequence length: 2049, sample length: 3465 [default0]:Skipping sample id=2748052. Maximum sequence length: 2049, sample length: 5212 [default0]:Skipping sample id=2748078. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2719743. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2721571. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2741314. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2734604. Maximum sequence length: 2049, sample length: 2780 [default0]:Skipping sample id=2723976. Maximum sequence length: 2049, sample length: 4620 [default0]:Skipping sample id=2495994. Maximum sequence length: 2049, sample length: 3448 [default0]:Skipping sample id=2742687. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2746353. Maximum sequence length: 2049, sample length: 2923 [default0]:Skipping sample id=2716060. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2726486. Maximum sequence length: 2049, sample length: 3029 [default0]:Skipping sample id=2734507. Maximum sequence length: 2049, sample length: 3422 [default0]:Skipping sample id=2725641. Maximum sequence length: 2049, sample length: 3602 [default0]:Skipping sample id=2756425. Maximum sequence length: 2049, sample length: 2747 [default0]:Skipping sample id=2742151. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2719069. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2718957. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2718588. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2720405. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2721389. Maximum sequence length: 2049, sample length: 4885 [default0]:Skipping sample id=2744659. Maximum sequence length: 2049, sample length: 4011 [default0]:Skipping sample id=2737830. Maximum sequence length: 2049, sample length: 3150 [default0]:Skipping sample id=2732457. Maximum sequence length: 2049, sample length: 5310 [default0]:Skipping sample id=2713775. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2720166. Maximum sequence length: 2049, sample length: 3040 [default0]:Skipping sample id=2749276. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2718325. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2740530. Maximum sequence length: 2049, sample length: 3034 [default0]:Skipping sample id=2737051. Maximum sequence length: 2049, sample length: 3964 [default0]:Skipping sample id=2733698. Maximum sequence length: 2049, sample length: 4020 [default0]:Skipping sample id=2726291. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2753929. Maximum sequence length: 2049, sample length: 4188 [default0]:Skipping sample id=2722450. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2495294. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2749236. Maximum sequence length: 2049, sample length: 3839 [default0]:Skipping sample id=2745737. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2740355. Maximum sequence length: 2049, sample length: 5312 [default0]:Skipping sample id=2741413. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2717008. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2749777. Maximum sequence length: 2049, sample length: 4451 [default0]:Skipping sample id=2726258. Maximum sequence length: 2049, sample length: 3993 [default0]:Skipping sample id=2732300. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2735225. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2730532. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2484156. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2734099. Maximum sequence length: 2049, sample length: 3100 [default0]:Skipping sample id=2710982. Maximum sequence length: 2049, sample length: 3403 [default0]:Skipping sample id=2748927. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2729284. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2738743. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2731248. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2725531. Maximum sequence length: 2049, sample length: 3312 [default0]:Skipping sample id=2495831. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2735929. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2718521. Maximum sequence length: 2049, sample length: 3465 [default0]:Skipping sample id=2729054. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2745428. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2711401. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2479283. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2496361. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2726701. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2737842. Maximum sequence length: 2049, sample length: 3926 [default0]:Skipping sample id=2741993. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2715747. Maximum sequence length: 2049, sample length: 3119 [default0]:Skipping sample id=2715943. Maximum sequence length: 2049, sample length: 6524 [default0]:Skipping sample id=2729478. Maximum sequence length: 2049, sample length: 5303 [default0]:Skipping sample id=2716622. Maximum sequence length: 2049, sample length: 4278 [default0]:Skipping sample id=2755313. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2720114. Maximum sequence length: 2049, sample length: 3447 [default0]:Skipping sample id=2715579. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2712745. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2754113. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2739769. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2744179. Maximum sequence length: 2049, sample length: 3437 [default0]:Skipping sample id=2748188. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2746702. Maximum sequence length: 2049, sample length: 4385 [default0]:Skipping sample id=2733281. Maximum sequence length: 2049, sample length: 3627 [default0]:Skipping sample id=2716214. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2726351. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2750736. Maximum sequence length: 2049, sample length: 2406 [default0]:Skipping sample id=2720880. Maximum sequence length: 2049, sample length: 4017 [default0]:Skipping sample id=2717989. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2746844. Maximum sequence length: 2049, sample length: 3223 [default0]:Skipping sample id=2735831. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2753611. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2737565. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2729704. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2744765. Maximum sequence length: 2049, sample length: 3348 [default0]:Skipping sample id=2721326. Maximum sequence length: 2049, sample length: 4048 [default0]:Skipping sample id=2729448. Maximum sequence length: 2049, sample length: 4026 [default0]:Skipping sample id=2728169. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2736293. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2719190. Maximum sequence length: 2049, sample length: 4036 [default0]:Skipping sample id=2736645. Maximum sequence length: 2049, sample length: 2649 [default0]:Skipping sample id=2728245. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2749891. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2724866. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2713315. Maximum sequence length: 2049, sample length: 3164 [default0]:Skipping sample id=2748393. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2752740. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2716179. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2746459. Maximum sequence length: 2049, sample length: 6153 [default0]:Skipping sample id=2748639. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2740085. Maximum sequence length: 2049, sample length: 6665 [default0]:Skipping sample id=2488862. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2751425. Maximum sequence length: 2049, sample length: 3950 [default0]:Skipping sample id=2718183. Maximum sequence length: 2049, sample length: 3298 [default0]:Skipping sample id=2727844. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2730256. Maximum sequence length: 2049, sample length: 3665 [default0]:Skipping sample id=2731747. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2752086. Maximum sequence length: 2049, sample length: 3080 [default0]:Skipping sample id=2470547. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2712321. Maximum sequence length: 2049, sample length: 6492 [default0]:Skipping sample id=2728911. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2721551. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2722097. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2727408. Maximum sequence length: 2049, sample length: 3699 [default0]:Skipping sample id=2726713. Maximum sequence length: 2049, sample length: 2714 [default0]:Skipping sample id=2737269. Maximum sequence length: 2049, sample length: 3430 [default0]:Skipping sample id=2495538. Maximum sequence length: 2049, sample length: 2369 [default0]:Skipping sample id=2731341. Maximum sequence length: 2049, sample length: 2907 [default0]:Skipping sample id=2718722. Maximum sequence length: 2049, sample length: 2747 [default0]:Skipping sample id=2751399. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2711727. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2714891. Maximum sequence length: 2049, sample length: 4779 [default0]:Skipping sample id=2714181. Maximum sequence length: 2049, sample length: 5974 [default0]:Skipping sample id=2721077. Maximum sequence length: 2049, sample length: 2569 [default0]:Skipping sample id=2740822. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2735010. Maximum sequence length: 2049, sample length: 4143 [default0]:Skipping sample id=2752408. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2721722. Maximum sequence length: 2049, sample length: 2903 [default0]:Skipping sample id=2738815. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2719390. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2468100. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2739766. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2496698. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2754639. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2712230. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2745947. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2479099. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2749993. Maximum sequence length: 2049, sample length: 3383 [default0]:Skipping sample id=2469942. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2730866. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2742893. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2737524. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2486669. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2718825. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2737001. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2715113. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2713570. Maximum sequence length: 2049, sample length: 3978 [default0]:Skipping sample id=2741010. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2746334. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2722727. Maximum sequence length: 2049, sample length: 4779 [default0]:Skipping sample id=2744639. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2749816. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2747279. Maximum sequence length: 2049, sample length: 2687 [default0]:Skipping sample id=2721276. Maximum sequence length: 2049, sample length: 5020 [default0]:Skipping sample id=2724003. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2482117. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2722023. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2750145. Maximum sequence length: 2049, sample length: 2915 [default0]:Skipping sample id=2750318. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2711278. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2495512. Maximum sequence length: 2049, sample length: 2470 [default0]:Skipping sample id=2754257. Maximum sequence length: 2049, sample length: 4793 [default0]:Skipping sample id=2755789. Maximum sequence length: 2049, sample length: 3724 [default0]:Skipping sample id=2729189. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2737008. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2495047. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2496982. Maximum sequence length: 2049, sample length: 3333 [default0]:Skipping sample id=2742041. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2717757. Maximum sequence length: 2049, sample length: 5357 [default0]:Skipping sample id=2726959. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2747735. Maximum sequence length: 2049, sample length: 5465 [default0]:Skipping sample id=2726885. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2483466. Maximum sequence length: 2049, sample length: 3198 [default0]:Skipping sample id=2727713. Maximum sequence length: 2049, sample length: 3193 [default0]:Skipping sample id=2756363. Maximum sequence length: 2049, sample length: 3487 [default0]:Skipping sample id=2486278. Maximum sequence length: 2049, sample length: 3146 [default0]:Skipping sample id=2713872. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2748788. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2717823. Maximum sequence length: 2049, sample length: 4252 [default0]:Skipping sample id=2721534. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2720014. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2717859. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2752345. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2467842. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2756898. Maximum sequence length: 2049, sample length: 3059 [default0]:Skipping sample id=2715057. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2479185. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2755472. Maximum sequence length: 2049, sample length: 6671 [default0]:Skipping sample id=2740921. Maximum sequence length: 2049, sample length: 2954 [default0]:Skipping sample id=2731265. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2722442. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2480944. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2739639. Maximum sequence length: 2049, sample length: 8471 [default0]:Skipping sample id=2723997. Maximum sequence length: 2049, sample length: 4134 [default0]:Skipping sample id=2754249. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2751315. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2728182. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2721549. Maximum sequence length: 2049, sample length: 3287 [default0]:Skipping sample id=2737423. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2717343. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2727545. Maximum sequence length: 2049, sample length: 3796 [default0]:Skipping sample id=2725647. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2749778. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2737987. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2725240. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2752116. Maximum sequence length: 2049, sample length: 3970 [default0]:Skipping sample id=2478000. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2471279. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2712097. Maximum sequence length: 2049, sample length: 4068 [default0]:Skipping sample id=2756209. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2720455. Maximum sequence length: 2049, sample length: 2989 [default0]:Skipping sample id=2711218. Maximum sequence length: 2049, sample length: 2953 [default0]:Skipping sample id=2750236. Maximum sequence length: 2049, sample length: 3241 [default0]:Skipping sample id=2745368. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2733273. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2711990. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2733109. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2720003. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2743387. Maximum sequence length: 2049, sample length: 4342 [default0]:Skipping sample id=2745639. Maximum sequence length: 2049, sample length: 3681 [default0]:Skipping sample id=2722855. Maximum sequence length: 2049, sample length: 3235 [default0]:Skipping sample id=2491301. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2712839. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2753545. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2739211. Maximum sequence length: 2049, sample length: 2929 [default0]:Skipping sample id=2713828. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2741302. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2741860. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2716366. Maximum sequence length: 2049, sample length: 3297 [default0]:Skipping sample id=2725148. Maximum sequence length: 2049, sample length: 3100 [default0]:Skipping sample id=2490107. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2721818. Maximum sequence length: 2049, sample length: 4974 [default0]:Skipping sample id=2721935. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2736245. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2729450. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2479516. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2731589. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2744708. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2731688. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2729000. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2490050. Maximum sequence length: 2049, sample length: 2507 [default0]:Skipping sample id=2752924. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2711241. Maximum sequence length: 2049, sample length: 3310 [default0]:Skipping sample id=2720349. Maximum sequence length: 2049, sample length: 5964 [default0]:Skipping sample id=2733819. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2712463. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2482768. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2751912. Maximum sequence length: 2049, sample length: 4760 [default0]:Skipping sample id=2719219. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2714194. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2730774. Maximum sequence length: 2049, sample length: 3438 [default0]:Skipping sample id=2713540. Maximum sequence length: 2049, sample length: 2992 [default0]:Skipping sample id=2743426. Maximum sequence length: 2049, sample length: 4657 [default0]:Skipping sample id=2736022. Maximum sequence length: 2049, sample length: 3152 [default0]:Skipping sample id=2731742. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2731277. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2484448. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2751537. Maximum sequence length: 2049, sample length: 4127 [default0]:Skipping sample id=2746638. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2756251. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2495790. Maximum sequence length: 2049, sample length: 3408 [default0]:Skipping sample id=2747139. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2729582. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2491815. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2484847. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2725600. Maximum sequence length: 2049, sample length: 2455 [default0]:Skipping sample id=2755662. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2714466. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2714988. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2754369. Maximum sequence length: 2049, sample length: 4749 [default0]:Skipping sample id=2489600. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2498643. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2733235. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2732470. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2740019. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2746443. Maximum sequence length: 2049, sample length: 3140 [default0]:Skipping sample id=2741551. Maximum sequence length: 2049, sample length: 3375 [default0]:Skipping sample id=2756979. Maximum sequence length: 2049, sample length: 4212 [default0]:Skipping sample id=2735731. Maximum sequence length: 2049, sample length: 3424 [default0]:Skipping sample id=2716598. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2715245. Maximum sequence length: 2049, sample length: 3809 [default0]:Skipping sample id=2742562. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2752912. Maximum sequence length: 2049, sample length: 3510 [default0]:Skipping sample id=2711575. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2496316. Maximum sequence length: 2049, sample length: 4325 [default0]:Skipping sample id=2743764. Maximum sequence length: 2049, sample length: 2963 [default0]:Skipping sample id=2730175. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2740725. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2711231. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2753741. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2485072. Maximum sequence length: 2049, sample length: 4328 [default0]:Skipping sample id=2727573. Maximum sequence length: 2049, sample length: 7095 [default0]:Skipping sample id=2712160. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2715055. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2749060. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2713253. Maximum sequence length: 2049, sample length: 3006 [default0]:Skipping sample id=2716513. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2715964. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2742222. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2753401. Maximum sequence length: 2049, sample length: 3090 [default0]:Skipping sample id=2728913. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2740990. Maximum sequence length: 2049, sample length: 2879 [default0]:Skipping sample id=2466921. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2746732. Maximum sequence length: 2049, sample length: 2841 [default0]:Skipping sample id=2721728. Maximum sequence length: 2049, sample length: 4245 [default0]:Skipping sample id=2711091. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2489273. Maximum sequence length: 2049, sample length: 4083 [default0]:Skipping sample id=2752092. Maximum sequence length: 2049, sample length: 4648 [default0]:Skipping sample id=2742194. Maximum sequence length: 2049, sample length: 3308 [default0]:Skipping sample id=2728647. Maximum sequence length: 2049, sample length: 3572 [default0]:Skipping sample id=2721513. Maximum sequence length: 2049, sample length: 5093 [default0]:Skipping sample id=2737652. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2746360. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2739705. Maximum sequence length: 2049, sample length: 2972 [default0]:Skipping sample id=2755923. Maximum sequence length: 2049, sample length: 4579 [default0]:Skipping sample id=2751116. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2752784. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2730852. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2741213. Maximum sequence length: 2049, sample length: 4042 [default0]:Skipping sample id=2745659. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2753512. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2752316. Maximum sequence length: 2049, sample length: 4913 [default0]:Skipping sample id=2752590. Maximum sequence length: 2049, sample length: 4546 [default0]:Skipping sample id=2734413. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2732154. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2711839. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2714622. Maximum sequence length: 2049, sample length: 4533 [default0]:Skipping sample id=2756351. Maximum sequence length: 2049, sample length: 2234 [default0]:Skipping sample id=2753389. Maximum sequence length: 2049, sample length: 3124 [default0]:Skipping sample id=2746314. Maximum sequence length: 2049, sample length: 3890 [default0]:Skipping sample id=2733137. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2718119. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2717534. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2732821. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2756097. Maximum sequence length: 2049, sample length: 3720 [default0]:Skipping sample id=2725155. Maximum sequence length: 2049, sample length: 4162 [default0]:Skipping sample id=2743355. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2733444. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2739734. Maximum sequence length: 2049, sample length: 5643 [default0]:Skipping sample id=2722984. Maximum sequence length: 2049, sample length: 6235 [default0]:Skipping sample id=2734908. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2467499. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2711235. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2754821. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2731564. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2745708. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2752022. Maximum sequence length: 2049, sample length: 4235 [default0]:Skipping sample id=2755349. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2757077. Maximum sequence length: 2049, sample length: 4695 [default0]:Skipping sample id=2746371. Maximum sequence length: 2049, sample length: 5160 [default0]:Skipping sample id=2746327. Maximum sequence length: 2049, sample length: 3374 [default0]:Skipping sample id=2716260. Maximum sequence length: 2049, sample length: 4878 [default0]:Skipping sample id=2479812. Maximum sequence length: 2049, sample length: 3415 [default0]:Skipping sample id=2737673. Maximum sequence length: 2049, sample length: 3056 [default0]:Skipping sample id=2717935. Maximum sequence length: 2049, sample length: 4122 [default0]:Skipping sample id=2742533. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2714642. Maximum sequence length: 2049, sample length: 3856 [default0]:Skipping sample id=2744043. Maximum sequence length: 2049, sample length: 3277 [default0]:Skipping sample id=2750068. Maximum sequence length: 2049, sample length: 5174 [default0]:Skipping sample id=2722131. Maximum sequence length: 2049, sample length: 3027 [default0]:Skipping sample id=2469749. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2491173. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2726647. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2498572. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2744115. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2738808. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2736201. Maximum sequence length: 2049, sample length: 6335 [default0]:Skipping sample id=2721740. Maximum sequence length: 2049, sample length: 3469 [default0]:Skipping sample id=2754227. Maximum sequence length: 2049, sample length: 4508 [default0]:Skipping sample id=2719759. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2735553. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2723419. Maximum sequence length: 2049, sample length: 3685 [default0]:Skipping sample id=2495535. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2718676. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2732098. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2722716. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2713226. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2490194. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2735865. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2736516. Maximum sequence length: 2049, sample length: 3976 [default0]:Skipping sample id=2752272. Maximum sequence length: 2049, sample length: 2766 [default0]:Skipping sample id=2719387. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2716501. Maximum sequence length: 2049, sample length: 4903 [default0]:Skipping sample id=2734170. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2732663. Maximum sequence length: 2049, sample length: 3599 [default0]:Skipping sample id=2714088. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2725735. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2741389. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2719260. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2711341. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2732803. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2714681. Maximum sequence length: 2049, sample length: 3696 [default0]:Skipping sample id=2747351. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2745299. Maximum sequence length: 2049, sample length: 3763 [default0]:Skipping sample id=2715777. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2735146. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2742191. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2744792. Maximum sequence length: 2049, sample length: 3063 [default0]:Skipping sample id=2755556. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2757066. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2477364. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2724752. Maximum sequence length: 2049, sample length: 4160 [default0]:Skipping sample id=2732360. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2485660. Maximum sequence length: 2049, sample length: 2639 [default0]:Skipping sample id=2738406. Maximum sequence length: 2049, sample length: 3568 [default0]:Skipping sample id=2713329. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2719370. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2745562. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2753151. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2712490. Maximum sequence length: 2049, sample length: 3574 [default0]:Skipping sample id=2731602. Maximum sequence length: 2049, sample length: 3568 [default0]:Skipping sample id=2720217. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2756014. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2746840. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2717886. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2725243. Maximum sequence length: 2049, sample length: 3454 [default0]:Skipping sample id=2752724. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2721248. Maximum sequence length: 2049, sample length: 4352 [default0]:Skipping sample id=2498523. Maximum sequence length: 2049, sample length: 3032 [default0]:Skipping sample id=2494684. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2751626. Maximum sequence length: 2049, sample length: 8039 [default0]:Skipping sample id=2744412. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2726944. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2749254. Maximum sequence length: 2049, sample length: 3584 [default0]:Skipping sample id=2728520. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2720681. Maximum sequence length: 2049, sample length: 2974 [default0]:Skipping sample id=2720822. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2752474. Maximum sequence length: 2049, sample length: 2941 [default0]:Skipping sample id=2756600. Maximum sequence length: 2049, sample length: 4048 [default0]:Skipping sample id=2721232. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2722763. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2482873. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2733089. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2714748. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2739542. Maximum sequence length: 2049, sample length: 2490 [default0]:Skipping sample id=2749047. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2728030. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2746999. Maximum sequence length: 2049, sample length: 3260 [default0]:Skipping sample id=2482589. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2744552. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2487869. Maximum sequence length: 2049, sample length: 2515 [default0]:Skipping sample id=2714138. Maximum sequence length: 2049, sample length: 2676 [default0]:Skipping sample id=2752820. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2743532. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2725338. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2731538. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2733530. Maximum sequence length: 2049, sample length: 3398 [default0]:Skipping sample id=2749351. Maximum sequence length: 2049, sample length: 3118 [default0]:Skipping sample id=2727467. Maximum sequence length: 2049, sample length: 3228 [default0]:Skipping sample id=2718629. Maximum sequence length: 2049, sample length: 4245 [default0]:Skipping sample id=2734234. Maximum sequence length: 2049, sample length: 4603 [default0]:Skipping sample id=2745769. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2498957. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2712797. Maximum sequence length: 2049, sample length: 4907 [default0]:Skipping sample id=2470377. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2712443. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2712203. Maximum sequence length: 2049, sample length: 5943 [default0]:Skipping sample id=2721432. Maximum sequence length: 2049, sample length: 3634 [default0]:Skipping sample id=2721065. Maximum sequence length: 2049, sample length: 4562 [default0]:Skipping sample id=2744863. Maximum sequence length: 2049, sample length: 5644 [default0]:Skipping sample id=2743504. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2715960. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2742346. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2722069. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2725685. Maximum sequence length: 2049, sample length: 4455 [default0]:Skipping sample id=2744796. Maximum sequence length: 2049, sample length: 3454 [default0]:Skipping sample id=2754273. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2735401. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2487688. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2719062. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2715825. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2723328. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2745055. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2753874. Maximum sequence length: 2049, sample length: 4701 [default0]:Skipping sample id=2743149. Maximum sequence length: 2049, sample length: 3793 [default0]:Skipping sample id=2494696. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2717212. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2747541. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2713499. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2722270. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2736033. Maximum sequence length: 2049, sample length: 5843 [default0]:Skipping sample id=2752819. Maximum sequence length: 2049, sample length: 6765 [default0]:Skipping sample id=2715787. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2719196. Maximum sequence length: 2049, sample length: 2748 [default0]:Skipping sample id=2741105. Maximum sequence length: 2049, sample length: 4081 [default0]:Skipping sample id=2734300. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2756670. Maximum sequence length: 2049, sample length: 2545 [default0]:Skipping sample id=2499141. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2724407. Maximum sequence length: 2049, sample length: 3952 [default0]:Skipping sample id=2720000. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2722265. Maximum sequence length: 2049, sample length: 2703 [default0]:Skipping sample id=2481449. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2734204. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2728951. Maximum sequence length: 2049, sample length: 4191 [default0]:Skipping sample id=2712474. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2746909. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2730629. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2713492. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2724280. Maximum sequence length: 2049, sample length: 3613 [default0]:Skipping sample id=2724208. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2731588. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2495205. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2712224. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2713043. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2488234. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2723013. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2728581. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2730029. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2481465. Maximum sequence length: 2049, sample length: 2777 [default0]:Skipping sample id=2741777. Maximum sequence length: 2049, sample length: 3553 [default0]:Skipping sample id=2719242. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2749163. Maximum sequence length: 2049, sample length: 3366 [default0]:Skipping sample id=2756649. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2721157. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2495965. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2727744. Maximum sequence length: 2049, sample length: 3226 [default0]:Skipping sample id=2748648. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2713882. Maximum sequence length: 2049, sample length: 4670 [default0]:Skipping sample id=2732731. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2735595. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2736052. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2721330. Maximum sequence length: 2049, sample length: 2935 [default0]:Skipping sample id=2467289. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2723912. Maximum sequence length: 2049, sample length: 2455 [default0]:Skipping sample id=2716176. Maximum sequence length: 2049, sample length: 2849 [default0]:Skipping sample id=2734323. Maximum sequence length: 2049, sample length: 4284 [default0]:Skipping sample id=2739185. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2731915. Maximum sequence length: 2049, sample length: 3276 [default0]:Skipping sample id=2741356. Maximum sequence length: 2049, sample length: 3936 [default0]:Skipping sample id=2756296. Maximum sequence length: 2049, sample length: 3056 [default0]:Skipping sample id=2499125. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2490110. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2740924. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2716066. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2719752. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2740633. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2725608. Maximum sequence length: 2049, sample length: 4539 [default0]:Skipping sample id=2727626. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2726503. Maximum sequence length: 2049, sample length: 3977 [default0]:Skipping sample id=2711106. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2733196. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2724698. Maximum sequence length: 2049, sample length: 6870 [default0]:Skipping sample id=2722332. Maximum sequence length: 2049, sample length: 3757 [default0]:Skipping sample id=2744151. Maximum sequence length: 2049, sample length: 3910 [default0]:Skipping sample id=2756630. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2740331. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2731367. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2743336. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2711102. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2716618. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2720184. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2746599. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2752128. Maximum sequence length: 2049, sample length: 3259 [default0]:Skipping sample id=2485175. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2731095. Maximum sequence length: 2049, sample length: 3129 [default0]:Skipping sample id=2713725. Maximum sequence length: 2049, sample length: 3687 [default0]:Skipping sample id=2747520. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2727062. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2734014. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2724754. Maximum sequence length: 2049, sample length: 3385 [default0]:Skipping sample id=2725980. Maximum sequence length: 2049, sample length: 2565 [default0]:Skipping sample id=2735161. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2492737. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2732082. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2720510. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2730627. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2740891. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2727231. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2742653. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2714161. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2717150. Maximum sequence length: 2049, sample length: 3709 [default0]:Skipping sample id=2744346. Maximum sequence length: 2049, sample length: 3313 [default0]:Skipping sample id=2725264. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2745457. Maximum sequence length: 2049, sample length: 4015 [default0]:Skipping sample id=2714985. Maximum sequence length: 2049, sample length: 3447 [default0]:Skipping sample id=2486924. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2716990. Maximum sequence length: 2049, sample length: 3160 [default0]:Skipping sample id=2735331. Maximum sequence length: 2049, sample length: 3026 [default0]:Skipping sample id=2717591. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2722437. Maximum sequence length: 2049, sample length: 2798 [default0]:Skipping sample id=2494991. Maximum sequence length: 2049, sample length: 3009 [default0]:Skipping sample id=2724781. Maximum sequence length: 2049, sample length: 2934 [default0]:Skipping sample id=2725204. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2752849. Maximum sequence length: 2049, sample length: 3751 [default0]:Skipping sample id=2713547. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2727936. Maximum sequence length: 2049, sample length: 4754 [default0]:Skipping sample id=2498063. Maximum sequence length: 2049, sample length: 2683 [default0]:Skipping sample id=2714403. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2750658. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2722796. Maximum sequence length: 2049, sample length: 2876 [default0]:Skipping sample id=2742021. Maximum sequence length: 2049, sample length: 4522 [default0]:Skipping sample id=2726823. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2731333. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2489133. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2717815. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2744793. Maximum sequence length: 2049, sample length: 3756 [default0]:Skipping sample id=2744089. Maximum sequence length: 2049, sample length: 3001 [default0]:Skipping sample id=2723264. Maximum sequence length: 2049, sample length: 5049 [default0]:Skipping sample id=2729749. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2722561. Maximum sequence length: 2049, sample length: 6963 [default0]:Skipping sample id=2725045. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2755970. Maximum sequence length: 2049, sample length: 2677 [default0]:Skipping sample id=2485220. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2734889. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2726749. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2477960. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2711314. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2741605. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2716287. Maximum sequence length: 2049, sample length: 4910 [default0]:Skipping sample id=2495119. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2734145. Maximum sequence length: 2049, sample length: 4492 [default0]:Skipping sample id=2717414. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2486901. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2756713. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2735226. Maximum sequence length: 2049, sample length: 5512 [default0]:Skipping sample id=2486966. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2735789. Maximum sequence length: 2049, sample length: 3295 [default0]:Skipping sample id=2712784. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2746037. Maximum sequence length: 2049, sample length: 3518 [default0]:Skipping sample id=2711116. Maximum sequence length: 2049, sample length: 3987 [default0]:Skipping sample id=2732760. Maximum sequence length: 2049, sample length: 2988 [default0]:Skipping sample id=2743248. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2750665. Maximum sequence length: 2049, sample length: 2803 [default0]:Skipping sample id=2730883. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2736442. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2732159. Maximum sequence length: 2049, sample length: 4158 [default0]:Skipping sample id=2712001. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2743671. Maximum sequence length: 2049, sample length: 4182 [default0]:Skipping sample id=2739840. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2727861. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2756067. Maximum sequence length: 2049, sample length: 3123 [default0]:Skipping sample id=2747302. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2741549. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2715985. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2738048. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2727580. Maximum sequence length: 2049, sample length: 2846 [default0]:Skipping sample id=2737279. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2714029. Maximum sequence length: 2049, sample length: 3304 [default0]:Skipping sample id=2751850. Maximum sequence length: 2049, sample length: 4213 [default0]:Skipping sample id=2725584. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2740299. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2739034. Maximum sequence length: 2049, sample length: 3378 [default0]:Skipping sample id=2754817. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2746667. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2729367. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2754228. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2465761. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2757008. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2743500. Maximum sequence length: 2049, sample length: 6101 [default0]:Skipping sample id=2483007. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2729629. Maximum sequence length: 2049, sample length: 2959 [default0]:Skipping sample id=2711333. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2746625. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2751837. Maximum sequence length: 2049, sample length: 5241 [default0]:Skipping sample id=2724709. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2755528. Maximum sequence length: 2049, sample length: 3273 [default0]:Skipping sample id=2755752. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2715921. Maximum sequence length: 2049, sample length: 2919 [default0]:Skipping sample id=2749001. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2744175. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2743752. Maximum sequence length: 2049, sample length: 3804 [default0]:Skipping sample id=2750077. Maximum sequence length: 2049, sample length: 3434 [default0]:Skipping sample id=2723227. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2743855. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2715902. Maximum sequence length: 2049, sample length: 3583 [default0]:Skipping sample id=2741481. Maximum sequence length: 2049, sample length: 3538 [default0]:Skipping sample id=2736790. Maximum sequence length: 2049, sample length: 5045 [default0]:Skipping sample id=2748416. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2734016. Maximum sequence length: 2049, sample length: 4235 [default0]:Skipping sample id=2753790. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2498759. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2731316. Maximum sequence length: 2049, sample length: 4978 [default0]:Skipping sample id=2712738. Maximum sequence length: 2049, sample length: 3073 [default0]:Skipping sample id=2717527. Maximum sequence length: 2049, sample length: 3263 [default0]:Skipping sample id=2754757. Maximum sequence length: 2049, sample length: 3644 [default0]:Skipping sample id=2720281. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2752502. Maximum sequence length: 2049, sample length: 4070 [default0]:Skipping sample id=2754929. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2715342. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2721477. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2749963. Maximum sequence length: 2049, sample length: 2982 [default0]:Skipping sample id=2735573. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2744276. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2730154. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2740027. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2723623. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2751332. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2746168. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2488095. Maximum sequence length: 2049, sample length: 3105 [default0]:Skipping sample id=2725203. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2738642. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2492777. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2753860. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2723040. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2750801. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2751598. Maximum sequence length: 2049, sample length: 3706 [default0]:Skipping sample id=2755082. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2491360. Maximum sequence length: 2049, sample length: 3327 [default0]:Skipping sample id=2722839. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2741162. Maximum sequence length: 2049, sample length: 5621 [default0]:Skipping sample id=2711923. Maximum sequence length: 2049, sample length: 4099 [default0]:Skipping sample id=2729171. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2489307. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2756456. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2750555. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2469866. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2717325. Maximum sequence length: 2049, sample length: 3071 [default0]:Skipping sample id=2734342. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2722169. Maximum sequence length: 2049, sample length: 4721 [default0]:Skipping sample id=2741787. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2714526. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2721724. Maximum sequence length: 2049, sample length: 5796 [default0]:Skipping sample id=2723980. Maximum sequence length: 2049, sample length: 4708 [default0]:Skipping sample id=2725335. Maximum sequence length: 2049, sample length: 7792 [default0]:Skipping sample id=2715052. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2739459. Maximum sequence length: 2049, sample length: 2623 [default0]:Skipping sample id=2751880. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2727535. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2726602. Maximum sequence length: 2049, sample length: 7319 [default0]:Skipping sample id=2728887. Maximum sequence length: 2049, sample length: 4235 [default0]:Skipping sample id=2753537. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2742144. Maximum sequence length: 2049, sample length: 3005 [default0]:Skipping sample id=2720715. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2492763. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2744518. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2714276. Maximum sequence length: 2049, sample length: 3268 [default0]:Skipping sample id=2747931. Maximum sequence length: 2049, sample length: 3063 [default0]:Skipping sample id=2743353. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2719070. Maximum sequence length: 2049, sample length: 3726 [default0]:Skipping sample id=2756100. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2745854. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2752027. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2743442. Maximum sequence length: 2049, sample length: 3631 [default0]:Skipping sample id=2723983. Maximum sequence length: 2049, sample length: 3694 [default0]:Skipping sample id=2721406. Maximum sequence length: 2049, sample length: 2737 [default0]:Skipping sample id=2744662. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2719134. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2727923. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2725433. Maximum sequence length: 2049, sample length: 3067 [default0]:Skipping sample id=2719425. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2740971. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2735094. Maximum sequence length: 2049, sample length: 3767 [default0]:Skipping sample id=2755518. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2753335. Maximum sequence length: 2049, sample length: 6766 [default0]:Skipping sample id=2753931. Maximum sequence length: 2049, sample length: 4204 [default0]:Skipping sample id=2716804. Maximum sequence length: 2049, sample length: 2642 [default0]:Skipping sample id=2750604. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2720094. Maximum sequence length: 2049, sample length: 2938 [default0]:Skipping sample id=2724320. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2713747. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2735638. Maximum sequence length: 2049, sample length: 3515 [default0]:Skipping sample id=2752431. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2716841. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2756355. Maximum sequence length: 2049, sample length: 5492 [default0]:Skipping sample id=2481480. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2751335. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2750875. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2732462. Maximum sequence length: 2049, sample length: 4834 [default0]:Skipping sample id=2488735. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2743343. Maximum sequence length: 2049, sample length: 3373 [default0]:Skipping sample id=2465903. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2756002. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2741890. Maximum sequence length: 2049, sample length: 3828 [default0]:Skipping sample id=2717862. Maximum sequence length: 2049, sample length: 3185 [default0]:Skipping sample id=2750548. Maximum sequence length: 2049, sample length: 2562 [default0]:Skipping sample id=2745245. Maximum sequence length: 2049, sample length: 3329 [default0]:Skipping sample id=2745546. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2493769. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2724884. Maximum sequence length: 2049, sample length: 3195 [default0]:Skipping sample id=2736628. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2466971. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2749996. Maximum sequence length: 2049, sample length: 3198 [default0]:Skipping sample id=2743999. Maximum sequence length: 2049, sample length: 4788 [default0]:Skipping sample id=2723896. Maximum sequence length: 2049, sample length: 2980 [default0]:Skipping sample id=2734082. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2723572. Maximum sequence length: 2049, sample length: 2640 [default0]:Skipping sample id=2724942. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2732348. Maximum sequence length: 2049, sample length: 5634 [default0]:Skipping sample id=2739819. Maximum sequence length: 2049, sample length: 3427 [default0]:Skipping sample id=2719423. Maximum sequence length: 2049, sample length: 4329 [default0]:Skipping sample id=2491328. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2713335. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2722806. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2730291. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2751690. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2731462. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2498331. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2728804. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2748100. Maximum sequence length: 2049, sample length: 3933 [default0]:Skipping sample id=2743429. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2744324. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2715465. Maximum sequence length: 2049, sample length: 3381 [default0]:Skipping sample id=2753382. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2718964. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2715394. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2744368. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2739339. Maximum sequence length: 2049, sample length: 2906 [default0]:Skipping sample id=2723186. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2713181. Maximum sequence length: 2049, sample length: 4416 [default0]:Skipping sample id=2737637. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2725375. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2739830. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2719249. Maximum sequence length: 2049, sample length: 3335 [default0]:Skipping sample id=2712977. Maximum sequence length: 2049, sample length: 3279 [default0]:Skipping sample id=2742536. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2486988. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2731033. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2732613. Maximum sequence length: 2049, sample length: 4057 [default0]:Skipping sample id=2756112. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2713046. Maximum sequence length: 2049, sample length: 4088 [default0]:Skipping sample id=2742488. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2730752. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2742269. Maximum sequence length: 2049, sample length: 3906 [default0]:Skipping sample id=2737569. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2721116. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2737991. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2711205. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2716344. Maximum sequence length: 2049, sample length: 7329 [default0]:Skipping sample id=2750515. Maximum sequence length: 2049, sample length: 3303 [default0]:Skipping sample id=2732700. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2732091. Maximum sequence length: 2049, sample length: 2966 [default0]:Skipping sample id=2721488. Maximum sequence length: 2049, sample length: 3223 [default0]:Skipping sample id=2716220. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2740279. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2754810. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2734198. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2730729. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2722829. Maximum sequence length: 2049, sample length: 2515 [default0]:Skipping sample id=2736536. Maximum sequence length: 2049, sample length: 4372 [default0]:Skipping sample id=2730938. Maximum sequence length: 2049, sample length: 2777 [default0]:Skipping sample id=2727395. Maximum sequence length: 2049, sample length: 4561 [default0]:Skipping sample id=2730396. Maximum sequence length: 2049, sample length: 3186 [default0]:Skipping sample id=2723706. Maximum sequence length: 2049, sample length: 3376 [default0]:Skipping sample id=2754654. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2723158. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2484400. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2729416. Maximum sequence length: 2049, sample length: 3294 [default0]:Skipping sample id=2727045. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2756017. Maximum sequence length: 2049, sample length: 3278 [default0]:Skipping sample id=2751375. Maximum sequence length: 2049, sample length: 4337 [default0]:Skipping sample id=2746827. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2730166. Maximum sequence length: 2049, sample length: 4793 [default0]:Skipping sample id=2751296. Maximum sequence length: 2049, sample length: 2749 [default0]:Skipping sample id=2496590. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2743121. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2478314. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2711661. Maximum sequence length: 2049, sample length: 3278 [default0]:Skipping sample id=2718896. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2755741. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2718930. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2725992. Maximum sequence length: 2049, sample length: 3968 [default0]:Skipping sample id=2483582. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2716418. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2719908. Maximum sequence length: 2049, sample length: 4803 [default0]:Skipping sample id=2482186. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2747129. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2722852. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2748964. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2736260. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2715079. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2744554. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2469947. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2729886. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2755132. Maximum sequence length: 2049, sample length: 3858 [default0]:Skipping sample id=2737939. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2745745. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2493901. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2723733. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2466240. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2740975. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2746703. Maximum sequence length: 2049, sample length: 4864 [default0]:Skipping sample id=2750920. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2750706. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2729561. Maximum sequence length: 2049, sample length: 3505 [default0]:Skipping sample id=2736461. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2745839. Maximum sequence length: 2049, sample length: 3715 [default0]:Skipping sample id=2731900. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2496091. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2742312. Maximum sequence length: 2049, sample length: 3422 [default0]:Skipping sample id=2466535. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2729959. Maximum sequence length: 2049, sample length: 2597 [default0]:Skipping sample id=2742863. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2734251. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2751826. Maximum sequence length: 2049, sample length: 2958 [default0]:Skipping sample id=2493927. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2734883. Maximum sequence length: 2049, sample length: 2704 [default0]:Skipping sample id=2722282. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2741304. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2734797. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2723086. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2480895. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2755975. Maximum sequence length: 2049, sample length: 3356 [default0]:Skipping sample id=2728621. Maximum sequence length: 2049, sample length: 4855 [default0]:Skipping sample id=2748278. Maximum sequence length: 2049, sample length: 3800 [default0]:Skipping sample id=2716753. Maximum sequence length: 2049, sample length: 3680 [default0]:Skipping sample id=2746043. Maximum sequence length: 2049, sample length: 3737 [default0]:Skipping sample id=2725619. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2487241. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2731254. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2719917. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2714935. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2753489. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2742738. Maximum sequence length: 2049, sample length: 3947 [default0]:Skipping sample id=2713892. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2745416. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2743531. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2733074. Maximum sequence length: 2049, sample length: 4324 [default0]:Skipping sample id=2490346. Maximum sequence length: 2049, sample length: 3462 [default0]:Skipping sample id=2748329. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2715460. Maximum sequence length: 2049, sample length: 4254 [default0]:Skipping sample id=2728357. Maximum sequence length: 2049, sample length: 4928 [default0]:Skipping sample id=2718868. Maximum sequence length: 2049, sample length: 4127 [default0]:Skipping sample id=2735054. Maximum sequence length: 2049, sample length: 4765 [default0]:Skipping sample id=2498060. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2726216. Maximum sequence length: 2049, sample length: 3090 [default0]:Skipping sample id=2747626. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2737824. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2735272. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2477695. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2718687. Maximum sequence length: 2049, sample length: 2954 [default0]:Skipping sample id=2741224. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2725215. Maximum sequence length: 2049, sample length: 3828 [default0]:Skipping sample id=2717045. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2712371. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2736550. Maximum sequence length: 2049, sample length: 2705 [default0]:Skipping sample id=2724219. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2468476. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2736681. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2754487. Maximum sequence length: 2049, sample length: 5534 [default0]:Skipping sample id=2469281. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2736875. Maximum sequence length: 2049, sample length: 3004 [default0]:Skipping sample id=2736771. Maximum sequence length: 2049, sample length: 5835 [default0]:Skipping sample id=2754437. Maximum sequence length: 2049, sample length: 3441 [default0]:Skipping sample id=2738329. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2726049. Maximum sequence length: 2049, sample length: 3232 [default0]:Skipping sample id=2743657. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2752581. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2751382. Maximum sequence length: 2049, sample length: 3419 [default0]:Skipping sample id=2489597. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2737149. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2730347. Maximum sequence length: 2049, sample length: 3624 [default0]:Skipping sample id=2724060. Maximum sequence length: 2049, sample length: 2434 [default0]:Skipping sample id=2723785. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2745098. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2755155. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2717037. Maximum sequence length: 2049, sample length: 4463 [default0]:Skipping sample id=2750343. Maximum sequence length: 2049, sample length: 6673 [default0]:Skipping sample id=2713917. Maximum sequence length: 2049, sample length: 3999 [default0]:Skipping sample id=2711946. Maximum sequence length: 2049, sample length: 4844 [default0]:Skipping sample id=2724207. Maximum sequence length: 2049, sample length: 6024 [default0]:Skipping sample id=2488471. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2492863. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2750586. Maximum sequence length: 2049, sample length: 4023 [default0]:Skipping sample id=2713808. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2745529. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2482305. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2728917. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2714519. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2726264. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2747650. Maximum sequence length: 2049, sample length: 2671 [default0]:Skipping sample id=2737193. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2723757. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2729172. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2722281. Maximum sequence length: 2049, sample length: 3284 [default0]:Skipping sample id=2489425. Maximum sequence length: 2049, sample length: 3383 [default0]:Skipping sample id=2477507. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2755715. Maximum sequence length: 2049, sample length: 4020 [default0]:Skipping sample id=2748069. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2479102. Maximum sequence length: 2049, sample length: 3466 [default0]:Skipping sample id=2712928. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2756729. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2490190. Maximum sequence length: 2049, sample length: 2678 [default0]:Skipping sample id=2482849. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2753037. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2741252. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2721910. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2495222. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2741394. Maximum sequence length: 2049, sample length: 3250 [default0]:Skipping sample id=2734771. Maximum sequence length: 2049, sample length: 4291 [default0]:Skipping sample id=2713988. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2748456. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2716707. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2721663. Maximum sequence length: 2049, sample length: 6080 [default0]:Skipping sample id=2755877. Maximum sequence length: 2049, sample length: 3577 [default0]:Skipping sample id=2719323. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2483211. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2729586. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2731892. Maximum sequence length: 2049, sample length: 2803 [default0]:Skipping sample id=2754736. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2711751. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2477345. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2736555. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2744124. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2713558. Maximum sequence length: 2049, sample length: 2954 [default0]:Skipping sample id=2713881. Maximum sequence length: 2049, sample length: 2557 [default0]:Skipping sample id=2466326. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2754694. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2747495. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2491143. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2722319. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2744867. Maximum sequence length: 2049, sample length: 3190 [default0]:Skipping sample id=2734419. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2714592. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2736872. Maximum sequence length: 2049, sample length: 3737 [default0]:Skipping sample id=2746806. Maximum sequence length: 2049, sample length: 3907 [default0]:Skipping sample id=2715934. Maximum sequence length: 2049, sample length: 5360 [default0]:Skipping sample id=2736860. Maximum sequence length: 2049, sample length: 7147 [default0]:Skipping sample id=2745631. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2714380. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2712388. Maximum sequence length: 2049, sample length: 4132 [default0]:Skipping sample id=2717586. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2752326. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2750003. Maximum sequence length: 2049, sample length: 3861 [default0]:Skipping sample id=2727437. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2724539. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2713656. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2714374. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2490333. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2735832. Maximum sequence length: 2049, sample length: 4172 [default0]:Skipping sample id=2742826. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2714755. Maximum sequence length: 2049, sample length: 2914 [default0]:Skipping sample id=2720746. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2737489. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2733577. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2725650. Maximum sequence length: 2049, sample length: 3474 [default0]:Skipping sample id=2754240. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2727830. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2743381. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2729569. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2739904. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2714607. Maximum sequence length: 2049, sample length: 4985 [default0]:Skipping sample id=2751504. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2713392. Maximum sequence length: 2049, sample length: 3465 [default0]:Skipping sample id=2724611. Maximum sequence length: 2049, sample length: 2717 [default0]:Skipping sample id=2482315. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2721764. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2714494. Maximum sequence length: 2049, sample length: 4153 [default0]:Skipping sample id=2736548. Maximum sequence length: 2049, sample length: 4124 [default0]:Skipping sample id=2728272. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2730680. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2727737. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2742250. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2744273. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2736744. Maximum sequence length: 2049, sample length: 2898 [default0]:Skipping sample id=2712427. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2746576. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2753581. Maximum sequence length: 2049, sample length: 4604 [default0]:Skipping sample id=2749148. Maximum sequence length: 2049, sample length: 3477 [default0]:Skipping sample id=2729512. Maximum sequence length: 2049, sample length: 4590 [default0]:Skipping sample id=2720659. Maximum sequence length: 2049, sample length: 3317 [default0]:Skipping sample id=2717053. Maximum sequence length: 2049, sample length: 3275 [default0]:Skipping sample id=2756434. Maximum sequence length: 2049, sample length: 3583 [default0]:Skipping sample id=2720005. Maximum sequence length: 2049, sample length: 4322 [default0]:Skipping sample id=2736547. Maximum sequence length: 2049, sample length: 4258 [default0]:Skipping sample id=2741069. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2745460. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2756717. Maximum sequence length: 2049, sample length: 6160 [default0]:Skipping sample id=2739394. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2745667. Maximum sequence length: 2049, sample length: 6445 [default0]:Skipping sample id=2714624. Maximum sequence length: 2049, sample length: 4324 [default0]:Skipping sample id=2714871. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2721317. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2730807. Maximum sequence length: 2049, sample length: 4591 [default0]:Skipping sample id=2748060. Maximum sequence length: 2049, sample length: 3713 [default0]:Skipping sample id=2730352. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2720027. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2711115. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2480842. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2494995. Maximum sequence length: 2049, sample length: 3610 [default0]:Skipping sample id=2484159. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2748441. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2738768. Maximum sequence length: 2049, sample length: 3297 [default0]:Skipping sample id=2756252. Maximum sequence length: 2049, sample length: 5867 [default0]:Skipping sample id=2725030. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2740542. Maximum sequence length: 2049, sample length: 3914 [default0]:Skipping sample id=2726212. Maximum sequence length: 2049, sample length: 2924 [default0]:Skipping sample id=2733029. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2485624. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2745842. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2748708. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2481397. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2489062. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2747701. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2725037. Maximum sequence length: 2049, sample length: 4810 [default0]:Skipping sample id=2755941. Maximum sequence length: 2049, sample length: 3241 [default0]:Skipping sample id=2488751. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2735477. Maximum sequence length: 2049, sample length: 4822 [default0]:Skipping sample id=2732439. Maximum sequence length: 2049, sample length: 4089 [default0]:Skipping sample id=2743457. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2721264. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2722831. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2746120. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2469701. Maximum sequence length: 2049, sample length: 2232 [default0]:Skipping sample id=2716973. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2747174. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2736372. Maximum sequence length: 2049, sample length: 3424 [default0]:Skipping sample id=2742526. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2739921. Maximum sequence length: 2049, sample length: 3073 [default0]:Skipping sample id=2491341. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2729237. Maximum sequence length: 2049, sample length: 3172 [default0]:Skipping sample id=2754559. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2484114. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2492841. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2732276. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2496470. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2729979. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2738252. Maximum sequence length: 2049, sample length: 2593 [default0]:Skipping sample id=2736187. Maximum sequence length: 2049, sample length: 3311 [default0]:Skipping sample id=2746135. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2742238. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2727152. Maximum sequence length: 2049, sample length: 2758 [default0]:Skipping sample id=2741682. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2723399. Maximum sequence length: 2049, sample length: 3086 [default0]:Skipping sample id=2741880. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2714623. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2711893. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2739170. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2749916. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2722964. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2490292. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2742244. Maximum sequence length: 2049, sample length: 4067 [default0]:Skipping sample id=2738146. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2748477. Maximum sequence length: 2049, sample length: 2644 [default0]:Skipping sample id=2745552. Maximum sequence length: 2049, sample length: 3195 [default0]:Skipping sample id=2741409. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2718014. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2731506. Maximum sequence length: 2049, sample length: 4335 [default0]:Skipping sample id=2718654. Maximum sequence length: 2049, sample length: 2553 [default0]:Skipping sample id=2756328. Maximum sequence length: 2049, sample length: 2923 [default0]:Skipping sample id=2726139. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2719790. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2720374. Maximum sequence length: 2049, sample length: 3809 [default0]:Skipping sample id=2733771. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2738381. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2728934. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2477477. Maximum sequence length: 2049, sample length: 2818 [default0]:Skipping sample id=2745943. Maximum sequence length: 2049, sample length: 3493 [default0]:Skipping sample id=2756044. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2733427. Maximum sequence length: 2049, sample length: 5811 [default0]:Skipping sample id=2754834. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2752438. Maximum sequence length: 2049, sample length: 3969 [default0]:Skipping sample id=2731401. Maximum sequence length: 2049, sample length: 3936 [default0]:Skipping sample id=2734607. Maximum sequence length: 2049, sample length: 5303 [default0]:Skipping sample id=2743059. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2723879. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2729475. Maximum sequence length: 2049, sample length: 2780 [default0]:Skipping sample id=2750612. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2736128. Maximum sequence length: 2049, sample length: 2986 [default0]:Skipping sample id=2714581. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2746753. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2750427. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2728411. Maximum sequence length: 2049, sample length: 2802 [default0]:Skipping sample id=2727482. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2728977. Maximum sequence length: 2049, sample length: 3970 [default0]:Skipping sample id=2738715. Maximum sequence length: 2049, sample length: 3120 [default0]:Skipping sample id=2727813. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2748593. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2712516. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2488708. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2742659. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2750691. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2712706. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2714706. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2748768. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2737970. Maximum sequence length: 2049, sample length: 3667 [default0]:Skipping sample id=2753144. Maximum sequence length: 2049, sample length: 4906 [default0]:Skipping sample id=2753680. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2745658. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2728549. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2715945. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2752655. Maximum sequence length: 2049, sample length: 3322 [default0]:Skipping sample id=2720680. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2487246. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2716967. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2744975. Maximum sequence length: 2049, sample length: 4598 [default0]:Skipping sample id=2731437. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2721359. Maximum sequence length: 2049, sample length: 4869 [default0]:Skipping sample id=2717137. Maximum sequence length: 2049, sample length: 4571 [default0]:Skipping sample id=2746697. Maximum sequence length: 2049, sample length: 2950 [default0]:Skipping sample id=2721280. Maximum sequence length: 2049, sample length: 2838 [default0]:Skipping sample id=2729807. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2480124. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2725073. Maximum sequence length: 2049, sample length: 3043 [default0]:Skipping sample id=2746897. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2737140. Maximum sequence length: 2049, sample length: 3904 [default0]:Skipping sample id=2731286. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2732152. Maximum sequence length: 2049, sample length: 3556 [default0]:Skipping sample id=2725091. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2750717. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2468795. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2757040. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2723462. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2743616. Maximum sequence length: 2049, sample length: 4696 [default0]:Skipping sample id=2724244. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2751865. Maximum sequence length: 2049, sample length: 3164 [default0]:Skipping sample id=2721117. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2755435. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2745088. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2725150. Maximum sequence length: 2049, sample length: 3596 [default0]:Skipping sample id=2755275. Maximum sequence length: 2049, sample length: 4956 [default0]:Skipping sample id=2732876. Maximum sequence length: 2049, sample length: 4967 [default0]:Skipping sample id=2737220. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2743269. Maximum sequence length: 2049, sample length: 3329 [default0]:Skipping sample id=2716188. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2468173. Maximum sequence length: 2049, sample length: 2741 [default0]:Skipping sample id=2730432. Maximum sequence length: 2049, sample length: 4579 [default0]:Skipping sample id=2736012. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2715667. Maximum sequence length: 2049, sample length: 4228 [default0]:Skipping sample id=2742636. Maximum sequence length: 2049, sample length: 3518 [default0]:Skipping sample id=2752993. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2724357. Maximum sequence length: 2049, sample length: 3037 [default0]:Skipping sample id=2724922. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2716127. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2750024. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2733508. Maximum sequence length: 2049, sample length: 5370 [default0]:Skipping sample id=2735277. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2733769. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2747376. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2736406. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2743279. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2727940. Maximum sequence length: 2049, sample length: 2486 [default0]:Skipping sample id=2753932. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2466192. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2721923. Maximum sequence length: 2049, sample length: 2961 [default0]:Skipping sample id=2719633. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2727864. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2746861. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2715962. Maximum sequence length: 2049, sample length: 3249 [default0]:Skipping sample id=2746978. Maximum sequence length: 2049, sample length: 4552 [default0]:Skipping sample id=2714704. Maximum sequence length: 2049, sample length: 3956 [default0]:Skipping sample id=2713581. Maximum sequence length: 2049, sample length: 4242 [default0]:Skipping sample id=2471072. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2721614. Maximum sequence length: 2049, sample length: 2868 [default0]:Skipping sample id=2755399. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2720165. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2737487. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2741645. Maximum sequence length: 2049, sample length: 3840 [default0]:Skipping sample id=2742949. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2743768. Maximum sequence length: 2049, sample length: 4587 [default0]:Skipping sample id=2756008. Maximum sequence length: 2049, sample length: 4472 [default0]:Skipping sample id=2722345. Maximum sequence length: 2049, sample length: 4339 [default0]:Skipping sample id=2747889. Maximum sequence length: 2049, sample length: 4943 [default0]:Skipping sample id=2754479. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2715122. Maximum sequence length: 2049, sample length: 3526 [default0]:Skipping sample id=2736894. Maximum sequence length: 2049, sample length: 4997 [default0]:Skipping sample id=2740823. Maximum sequence length: 2049, sample length: 5750 [default0]:Skipping sample id=2731432. Maximum sequence length: 2049, sample length: 4817 [default0]:Skipping sample id=2477034. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2753893. Maximum sequence length: 2049, sample length: 3023 [default0]:Skipping sample id=2719787. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2743124. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2745852. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2753246. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2718534. Maximum sequence length: 2049, sample length: 3571 [default0]:Skipping sample id=2738339. Maximum sequence length: 2049, sample length: 4072 [default0]:Skipping sample id=2739050. Maximum sequence length: 2049, sample length: 3436 [default0]:Skipping sample id=2716462. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2756592. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2720452. Maximum sequence length: 2049, sample length: 5675 [default0]:Skipping sample id=2748517. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2753722. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2749788. Maximum sequence length: 2049, sample length: 4019 [default0]:Skipping sample id=2726381. Maximum sequence length: 2049, sample length: 3301 [default0]:Skipping sample id=2753605. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2718877. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2715974. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2713629. Maximum sequence length: 2049, sample length: 4502 [default0]:Skipping sample id=2754056. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2716737. Maximum sequence length: 2049, sample length: 3459 [default0]:Skipping sample id=2746050. Maximum sequence length: 2049, sample length: 2955 [default0]:Skipping sample id=2718026. Maximum sequence length: 2049, sample length: 4249 [default0]:Skipping sample id=2724126. Maximum sequence length: 2049, sample length: 3172 [default0]:Skipping sample id=2734840. Maximum sequence length: 2049, sample length: 4175 [default0]:Skipping sample id=2731349. Maximum sequence length: 2049, sample length: 3239 [default0]:Skipping sample id=2752449. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2752668. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2484732. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2715534. Maximum sequence length: 2049, sample length: 3927 [default0]:Skipping sample id=2715535. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2755061. Maximum sequence length: 2049, sample length: 3096 [default0]:Skipping sample id=2712264. Maximum sequence length: 2049, sample length: 3949 [default0]:Skipping sample id=2746973. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2726361. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2730957. Maximum sequence length: 2049, sample length: 4014 [default0]:Skipping sample id=2719815. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2756201. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2745016. Maximum sequence length: 2049, sample length: 2908 [default0]:Skipping sample id=2721539. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2737472. Maximum sequence length: 2049, sample length: 3395 [default0]:Skipping sample id=2467542. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2730782. Maximum sequence length: 2049, sample length: 5532 [default0]:Skipping sample id=2746401. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2751588. Maximum sequence length: 2049, sample length: 6162 [default0]:Skipping sample id=2739908. Maximum sequence length: 2049, sample length: 6555 [default0]:Skipping sample id=2748346. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2742139. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2740060. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2730383. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2734888. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2753798. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2736491. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2714970. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2725090. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2484684. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2747595. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2732514. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2485218. Maximum sequence length: 2049, sample length: 2656 [default0]:Skipping sample id=2720965. Maximum sequence length: 2049, sample length: 3656 [default0]:Skipping sample id=2755526. Maximum sequence length: 2049, sample length: 4044 [default0]:Skipping sample id=2755470. Maximum sequence length: 2049, sample length: 3623 [default0]:Skipping sample id=2711330. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2737769. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2724625. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2731886. Maximum sequence length: 2049, sample length: 3193 [default0]:Skipping sample id=2754954. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2748722. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2719041. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2748504. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2732894. Maximum sequence length: 2049, sample length: 4475 [default0]:Skipping sample id=2471230. Maximum sequence length: 2049, sample length: 3104 [default0]:Skipping sample id=2732023. Maximum sequence length: 2049, sample length: 4681 [default0]:Skipping sample id=2746710. Maximum sequence length: 2049, sample length: 4314 [default0]:Skipping sample id=2726133. Maximum sequence length: 2049, sample length: 2717 [default0]:Skipping sample id=2732606. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2715685. Maximum sequence length: 2049, sample length: 6345 [default0]:Skipping sample id=2754135. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2751792. Maximum sequence length: 2049, sample length: 3755 [default0]:Skipping sample id=2754578. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2751174. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2735996. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2480341. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2737929. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2748085. Maximum sequence length: 2049, sample length: 3909 [default0]:Skipping sample id=2755771. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2498359. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2717851. Maximum sequence length: 2049, sample length: 3323 [default0]:Skipping sample id=2746722. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2728583. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2736162. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2711549. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2711223. Maximum sequence length: 2049, sample length: 2756 [default0]:Skipping sample id=2483477. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2752024. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2720146. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2718047. Maximum sequence length: 2049, sample length: 4934 [default0]:Skipping sample id=2730010. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2736490. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2739160. Maximum sequence length: 2049, sample length: 5102 [default0]:Skipping sample id=2740139. Maximum sequence length: 2049, sample length: 4038 [default0]:Skipping sample id=2482292. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2733944. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2754558. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2752748. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2722288. Maximum sequence length: 2049, sample length: 14264 [default0]:Skipping sample id=2746157. Maximum sequence length: 2049, sample length: 3410 [default0]:Skipping sample id=2717161. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2733814. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2737976. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2721258. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2712538. Maximum sequence length: 2049, sample length: 2869 [default0]:Skipping sample id=2470290. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2725116. Maximum sequence length: 2049, sample length: 3105 [default0]:Skipping sample id=2736972. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2718060. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2753736. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2481002. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2731230. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2718076. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2739579. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2719384. Maximum sequence length: 2049, sample length: 4301 [default0]:Skipping sample id=2751494. Maximum sequence length: 2049, sample length: 4175 [default0]:Skipping sample id=2483317. Maximum sequence length: 2049, sample length: 2421 [default0]:Skipping sample id=2720876. Maximum sequence length: 2049, sample length: 3119 [default0]:Skipping sample id=2717243. Maximum sequence length: 2049, sample length: 5466 [default0]:Skipping sample id=2747727. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2718161. Maximum sequence length: 2049, sample length: 3531 [default0]:Skipping sample id=2721461. Maximum sequence length: 2049, sample length: 3516 [default0]:Skipping sample id=2729671. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2723459. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2712274. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2712614. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2713694. Maximum sequence length: 2049, sample length: 3032 [default0]:Skipping sample id=2714372. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2718311. Maximum sequence length: 2049, sample length: 4030 [default0]:Skipping sample id=2737243. Maximum sequence length: 2049, sample length: 3244 [default0]:Skipping sample id=2730827. Maximum sequence length: 2049, sample length: 3742 [default0]:Skipping sample id=2467601. Maximum sequence length: 2049, sample length: 2740 [default0]:Skipping sample id=2746902. Maximum sequence length: 2049, sample length: 4267 [default0]:Skipping sample id=2746794. Maximum sequence length: 2049, sample length: 4232 [default0]:Skipping sample id=2716628. Maximum sequence length: 2049, sample length: 3293 [default0]:Skipping sample id=2729132. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2737659. Maximum sequence length: 2049, sample length: 3328 [default0]:Skipping sample id=2729838. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2726368. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2715617. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2719040. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2714815. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2747266. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2465878. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2743171. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2716972. Maximum sequence length: 2049, sample length: 3775 [default0]:Skipping sample id=2732946. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2725865. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2730896. Maximum sequence length: 2049, sample length: 5153 [default0]:Skipping sample id=2732673. Maximum sequence length: 2049, sample length: 3865 [default0]:Skipping sample id=2735633. Maximum sequence length: 2049, sample length: 2899 [default0]:Skipping sample id=2750818. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2729299. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2738831. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2730358. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2715012. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2756961. Maximum sequence length: 2049, sample length: 4870 [default0]:Skipping sample id=2756049. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2716359. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2751632. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2730930. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2749709. Maximum sequence length: 2049, sample length: 3057 [default0]:Skipping sample id=2716202. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2735630. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2751593. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2491577. Maximum sequence length: 2049, sample length: 3314 [default0]:Skipping sample id=2712262. Maximum sequence length: 2049, sample length: 4261 [default0]:Skipping sample id=2756250. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2737983. Maximum sequence length: 2049, sample length: 5486 [default0]:Skipping sample id=2484836. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2745107. Maximum sequence length: 2049, sample length: 3234 [default0]:Skipping sample id=2718557. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2723471. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2496419. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2734314. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2481914. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2719033. Maximum sequence length: 2049, sample length: 3041 [default0]:Skipping sample id=2477385. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2737668. Maximum sequence length: 2049, sample length: 3074 [default0]:Skipping sample id=2729373. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2754350. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2725723. Maximum sequence length: 2049, sample length: 3388 [default0]:Skipping sample id=2736415. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2714350. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2740590. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2716088. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2726290. Maximum sequence length: 2049, sample length: 4141 [default0]:Skipping sample id=2740354. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2743937. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2739346. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2746253. Maximum sequence length: 2049, sample length: 3412 [default0]:Skipping sample id=2745352. Maximum sequence length: 2049, sample length: 3661 [default0]:Skipping sample id=2756516. Maximum sequence length: 2049, sample length: 5270 [default0]:Skipping sample id=2487823. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2496261. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2712172. Maximum sequence length: 2049, sample length: 3149 [default0]:Skipping sample id=2752104. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2718935. Maximum sequence length: 2049, sample length: 3170 [default0]:Skipping sample id=2746987. Maximum sequence length: 2049, sample length: 3971 [default0]:Skipping sample id=2750056. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2711464. Maximum sequence length: 2049, sample length: 2896 [default0]:Skipping sample id=2720400. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2724196. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2713798. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2485117. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2725423. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2711476. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2721836. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2736886. Maximum sequence length: 2049, sample length: 3061 [default0]:Skipping sample id=2713273. Maximum sequence length: 2049, sample length: 4095 [default0]:Skipping sample id=2483129. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2753190. Maximum sequence length: 2049, sample length: 3701 [default0]:Skipping sample id=2750339. Maximum sequence length: 2049, sample length: 4561 [default0]:Skipping sample id=2731681. Maximum sequence length: 2049, sample length: 5816 [default0]:Skipping sample id=2726218. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2726453. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2485454. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2743580. Maximum sequence length: 2049, sample length: 4018 [default0]:Skipping sample id=2488838. Maximum sequence length: 2049, sample length: 2892 [default0]:Skipping sample id=2712686. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2743126. Maximum sequence length: 2049, sample length: 2440 [default0]:Skipping sample id=2746819. Maximum sequence length: 2049, sample length: 4690 [default0]:Skipping sample id=2715340. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2715040. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2740165. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2716255. Maximum sequence length: 2049, sample length: 5938 [default0]:Skipping sample id=2740577. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2731361. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2753370. Maximum sequence length: 2049, sample length: 5492 [default0]:Skipping sample id=2724945. Maximum sequence length: 2049, sample length: 3257 [default0]:Skipping sample id=2717091. Maximum sequence length: 2049, sample length: 3418 [default0]:Skipping sample id=2744898. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2748992. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2720429. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2712987. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2740860. Maximum sequence length: 2049, sample length: 3299 [default0]:Skipping sample id=2715447. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2728548. Maximum sequence length: 2049, sample length: 3668 [default0]:Skipping sample id=2714653. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2725436. Maximum sequence length: 2049, sample length: 5997 [default0]:Skipping sample id=2733586. Maximum sequence length: 2049, sample length: 3354 [default0]:Skipping sample id=2714429. Maximum sequence length: 2049, sample length: 6465 [default0]:Skipping sample id=2711456. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2733453. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2752079. Maximum sequence length: 2049, sample length: 2750 [default0]:Skipping sample id=2718366. Maximum sequence length: 2049, sample length: 2573 [default0]:Skipping sample id=2749120. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2470827. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2746497. Maximum sequence length: 2049, sample length: 8032 [default0]:Skipping sample id=2465747. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2717052. Maximum sequence length: 2049, sample length: 4099 [default0]:Skipping sample id=2724929. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2711003. Maximum sequence length: 2049, sample length: 3803 [default0]:Skipping sample id=2751433. Maximum sequence length: 2049, sample length: 3239 [default0]:Skipping sample id=2731391. Maximum sequence length: 2049, sample length: 4204 [default0]:Skipping sample id=2748197. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2739238. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2478334. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2483411. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2733652. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2726793. Maximum sequence length: 2049, sample length: 4042 [default0]:Skipping sample id=2735637. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2721929. Maximum sequence length: 2049, sample length: 3120 [default0]:Skipping sample id=2729445. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2477761. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2494201. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2715627. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2718459. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2736049. Maximum sequence length: 2049, sample length: 5135 [default0]:Skipping sample id=2714386. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2715395. Maximum sequence length: 2049, sample length: 5542 [default0]:Skipping sample id=2714212. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2484195. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2732340. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2754981. Maximum sequence length: 2049, sample length: 5817 [default0]:Skipping sample id=2466030. Maximum sequence length: 2049, sample length: 3075 [default0]:Skipping sample id=2739578. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2734208. Maximum sequence length: 2049, sample length: 5115 [default0]:Skipping sample id=2468353. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2730021. Maximum sequence length: 2049, sample length: 3091 [default0]:Skipping sample id=2731356. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2723588. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2717134. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2711681. Maximum sequence length: 2049, sample length: 2506 [default0]:Skipping sample id=2724799. Maximum sequence length: 2049, sample length: 2531 [default0]:Skipping sample id=2728286. Maximum sequence length: 2049, sample length: 4316 [default0]:Skipping sample id=2723180. Maximum sequence length: 2049, sample length: 4190 [default0]:Skipping sample id=2730776. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2720138. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2714337. Maximum sequence length: 2049, sample length: 5554 [default0]:Skipping sample id=2753621. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2736635. Maximum sequence length: 2049, sample length: 4229 [default0]:Skipping sample id=2741766. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2725551. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2483415. Maximum sequence length: 2049, sample length: 2888 [default0]:Skipping sample id=2724158. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2711548. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2746418. Maximum sequence length: 2049, sample length: 3226 [default0]:Skipping sample id=2721777. Maximum sequence length: 2049, sample length: 3551 [default0]:Skipping sample id=2730260. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2746898. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2718346. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2756375. Maximum sequence length: 2049, sample length: 3246 [default0]:Skipping sample id=2715999. Maximum sequence length: 2049, sample length: 3643 [default0]:Skipping sample id=2722257. Maximum sequence length: 2049, sample length: 3578 [default0]:Skipping sample id=2720837. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2727002. Maximum sequence length: 2049, sample length: 3997 [default0]:Skipping sample id=2753224. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2734035. Maximum sequence length: 2049, sample length: 3986 [default0]:Skipping sample id=2740124. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2724576. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2754383. Maximum sequence length: 2049, sample length: 3581 [default0]:Skipping sample id=2735298. Maximum sequence length: 2049, sample length: 3694 [default0]:Skipping sample id=2747971. Maximum sequence length: 2049, sample length: 3934 [default0]:Skipping sample id=2713911. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2485188. Maximum sequence length: 2049, sample length: 3017 [default0]:Skipping sample id=2753104. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2749309. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2735586. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2749217. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2726582. Maximum sequence length: 2049, sample length: 4628 [default0]:Skipping sample id=2726233. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2730677. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2749740. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2734533. Maximum sequence length: 2049, sample length: 3254 [default0]:Skipping sample id=2732288. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2723517. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2489817. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2714781. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2740262. Maximum sequence length: 2049, sample length: 6479 [default0]:Skipping sample id=2478806. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2751794. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2717253. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2741844. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2755936. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2728690. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2715895. Maximum sequence length: 2049, sample length: 3192 [default0]:Skipping sample id=2479538. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2719886. Maximum sequence length: 2049, sample length: 3829 [default0]:Skipping sample id=2711499. Maximum sequence length: 2049, sample length: 3293 [default0]:Skipping sample id=2749636. Maximum sequence length: 2049, sample length: 3337 [default0]:Skipping sample id=2739446. Maximum sequence length: 2049, sample length: 3315 [default0]:Skipping sample id=2719877. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2724872. Maximum sequence length: 2049, sample length: 2369 [default0]:Skipping sample id=2484427. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2752551. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2711436. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2747280. Maximum sequence length: 2049, sample length: 3401 [default0]:Skipping sample id=2498849. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2744137. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2470154. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2723499. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2752084. Maximum sequence length: 2049, sample length: 2999 [default0]:Skipping sample id=2716599. Maximum sequence length: 2049, sample length: 4238 [default0]:Skipping sample id=2734106. Maximum sequence length: 2049, sample length: 3969 [default0]:Skipping sample id=2738637. Maximum sequence length: 2049, sample length: 5145 [default0]:Skipping sample id=2751076. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2719109. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2730017. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2495400. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2495798. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2723119. Maximum sequence length: 2049, sample length: 3144 [default0]:Skipping sample id=2721494. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2736150. Maximum sequence length: 2049, sample length: 4223 [default0]:Skipping sample id=2715891. Maximum sequence length: 2049, sample length: 3688 [default0]:Skipping sample id=2713622. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2728716. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2730171. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2747521. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2744383. Maximum sequence length: 2049, sample length: 3266 [default0]:Skipping sample id=2492632. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2727010. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2726677. Maximum sequence length: 2049, sample length: 2862 [default0]:Skipping sample id=2727342. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2755282. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2712558. Maximum sequence length: 2049, sample length: 4334 [default0]:Skipping sample id=2714022. Maximum sequence length: 2049, sample length: 3606 [default0]:Skipping sample id=2731743. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2744600. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2756151. Maximum sequence length: 2049, sample length: 5435 [default0]:Skipping sample id=2720048. Maximum sequence length: 2049, sample length: 3988 [default0]:Skipping sample id=2497341. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2727410. Maximum sequence length: 2049, sample length: 3267 [default0]:Skipping sample id=2716303. Maximum sequence length: 2049, sample length: 3627 [default0]:Skipping sample id=2746194. Maximum sequence length: 2049, sample length: 4497 [default0]:Skipping sample id=2745026. Maximum sequence length: 2049, sample length: 4426 [default0]:Skipping sample id=2754934. Maximum sequence length: 2049, sample length: 2822 [default0]:Skipping sample id=2747566. Maximum sequence length: 2049, sample length: 4000 [default0]:Skipping sample id=2726370. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2718817. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2495335. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2748607. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2724476. Maximum sequence length: 2049, sample length: 2792 [default0]:Skipping sample id=2728039. Maximum sequence length: 2049, sample length: 4792 [default0]:Skipping sample id=2743529. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2714293. Maximum sequence length: 2049, sample length: 2869 [default0]:Skipping sample id=2750095. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2744147. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2725981. Maximum sequence length: 2049, sample length: 8168 [default0]:Skipping sample id=2740606. Maximum sequence length: 2049, sample length: 4592 [default0]:Skipping sample id=2753255. Maximum sequence length: 2049, sample length: 5268 [default0]:Skipping sample id=2754567. Maximum sequence length: 2049, sample length: 4120 [default0]:Skipping sample id=2726741. Maximum sequence length: 2049, sample length: 3788 [default0]:Skipping sample id=2736178. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2712469. Maximum sequence length: 2049, sample length: 4772 [default0]:Skipping sample id=2719781. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2728110. Maximum sequence length: 2049, sample length: 2765 [default0]:Skipping sample id=2735949. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2716675. Maximum sequence length: 2049, sample length: 3597 [default0]:Skipping sample id=2726671. Maximum sequence length: 2049, sample length: 5096 [default0]:Skipping sample id=2755273. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2726003. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2497810. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2734139. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2717691. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2727048. Maximum sequence length: 2049, sample length: 3268 [default0]:Skipping sample id=2487937. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2754839. Maximum sequence length: 2049, sample length: 4067 [default0]:Skipping sample id=2740621. Maximum sequence length: 2049, sample length: 4466 [default0]:Skipping sample id=2744229. Maximum sequence length: 2049, sample length: 5528 [default0]:Skipping sample id=2736616. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2753140. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2480505. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2728892. Maximum sequence length: 2049, sample length: 3260 [default0]:Skipping sample id=2468224. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2729067. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2746843. Maximum sequence length: 2049, sample length: 4807 [default0]:Skipping sample id=2716918. Maximum sequence length: 2049, sample length: 3635 [default0]:Skipping sample id=2756521. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2753627. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2711880. Maximum sequence length: 2049, sample length: 3698 [default0]:Skipping sample id=2744722. Maximum sequence length: 2049, sample length: 3630 [default0]:Skipping sample id=2719088. Maximum sequence length: 2049, sample length: 3221 [default0]:Skipping sample id=2738157. Maximum sequence length: 2049, sample length: 4269 [default0]:Skipping sample id=2731487. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2727141. Maximum sequence length: 2049, sample length: 2753 [default0]:Skipping sample id=2719984. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2488834. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2711417. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2743341. Maximum sequence length: 2049, sample length: 3326 [default0]:Skipping sample id=2711076. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2751574. Maximum sequence length: 2049, sample length: 3232 [default0]:Skipping sample id=2712355. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2733975. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2495608. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2756145. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2725905. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2713948. Maximum sequence length: 2049, sample length: 3868 [default0]:Skipping sample id=2719751. Maximum sequence length: 2049, sample length: 4533 [default0]:Skipping sample id=2747511. Maximum sequence length: 2049, sample length: 3308 [default0]:Skipping sample id=2493870. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2730137. Maximum sequence length: 2049, sample length: 3731 [default0]:Skipping sample id=2735009. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2737361. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2492353. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2742320. Maximum sequence length: 2049, sample length: 3364 [default0]:Skipping sample id=2750874. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2467482. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2730088. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2489088. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2498936. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2731037. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2755354. Maximum sequence length: 2049, sample length: 3017 [default0]:Skipping sample id=2496838. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2479214. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2755374. Maximum sequence length: 2049, sample length: 3405 [default0]:Skipping sample id=2718798. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2745388. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2496124. Maximum sequence length: 2049, sample length: 2837 [default0]:Skipping sample id=2468556. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2740018. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2749577. Maximum sequence length: 2049, sample length: 6637 [default0]:Skipping sample id=2721310. Maximum sequence length: 2049, sample length: 3096 [default0]:Skipping sample id=2748339. Maximum sequence length: 2049, sample length: 4328 [default0]:Skipping sample id=2754460. Maximum sequence length: 2049, sample length: 3456 [default0]:Skipping sample id=2756871. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2754972. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2468046. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2488707. Maximum sequence length: 2049, sample length: 3170 [default0]:Skipping sample id=2747873. Maximum sequence length: 2049, sample length: 3925 [default0]:Skipping sample id=2741340. Maximum sequence length: 2049, sample length: 3698 [default0]:Skipping sample id=2753681. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2720638. Maximum sequence length: 2049, sample length: 5163 [default0]:Skipping sample id=2715975. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2733538. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2749426. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2730357. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2721708. Maximum sequence length: 2049, sample length: 3120 [default0]:Skipping sample id=2721298. Maximum sequence length: 2049, sample length: 2705 [default0]:Skipping sample id=2720198. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2742869. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2743371. Maximum sequence length: 2049, sample length: 3154 [default0]:Skipping sample id=2720651. Maximum sequence length: 2049, sample length: 4145 [default0]:Skipping sample id=2723483. Maximum sequence length: 2049, sample length: 4965 [default0]:Skipping sample id=2739453. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2719032. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2719891. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2727089. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2730263. Maximum sequence length: 2049, sample length: 4312 [default0]:Skipping sample id=2723236. Maximum sequence length: 2049, sample length: 3922 [default0]:Skipping sample id=2718048. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2750651. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2743217. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2750082. Maximum sequence length: 2049, sample length: 3363 [default0]:Skipping sample id=2732028. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2750364. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2748626. Maximum sequence length: 2049, sample length: 2557 [default0]:Skipping sample id=2746627. Maximum sequence length: 2049, sample length: 3827 [default0]:Skipping sample id=2485160. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2744646. Maximum sequence length: 2049, sample length: 3356 [default0]:Skipping sample id=2712072. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2737203. Maximum sequence length: 2049, sample length: 4539 [default0]:Skipping sample id=2715929. Maximum sequence length: 2049, sample length: 4988 [default0]:Skipping sample id=2735203. Maximum sequence length: 2049, sample length: 3240 [default0]:Skipping sample id=2495401. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2731661. Maximum sequence length: 2049, sample length: 5192 [default0]:Skipping sample id=2488551. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2714261. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2494460. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2730070. Maximum sequence length: 2049, sample length: 4199 [default0]:Skipping sample id=2743872. Maximum sequence length: 2049, sample length: 2808 [default0]:Skipping sample id=2729393. Maximum sequence length: 2049, sample length: 4138 [default0]:Skipping sample id=2725661. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2714388. Maximum sequence length: 2049, sample length: 3151 [default0]:Skipping sample id=2752657. Maximum sequence length: 2049, sample length: 5847 [default0]:Skipping sample id=2711327. Maximum sequence length: 2049, sample length: 2502 [default0]:Skipping sample id=2741134. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2740170. Maximum sequence length: 2049, sample length: 3761 [default0]:Skipping sample id=2736152. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2742998. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2718447. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2482877. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2734556. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2720164. Maximum sequence length: 2049, sample length: 3239 [default0]:Skipping sample id=2727592. Maximum sequence length: 2049, sample length: 4739 [default0]:Skipping sample id=2741627. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2754946. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2712346. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2723952. Maximum sequence length: 2049, sample length: 3039 [default0]:Skipping sample id=2720101. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2715636. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2710974. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2722204. Maximum sequence length: 2049, sample length: 4851 [default0]:Skipping sample id=2727568. Maximum sequence length: 2049, sample length: 3204 [default0]:Skipping sample id=2752679. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2738033. Maximum sequence length: 2049, sample length: 3937 [default0]:Skipping sample id=2499158. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2745993. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2723930. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2753251. Maximum sequence length: 2049, sample length: 2907 [default0]:Skipping sample id=2756409. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2752670. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2738721. Maximum sequence length: 2049, sample length: 4808 [default0]:Skipping sample id=2720064. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2743247. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2468104. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2714439. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2723910. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2730651. Maximum sequence length: 2049, sample length: 3135 [default0]:Skipping sample id=2728144. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2752054. Maximum sequence length: 2049, sample length: 2646 [default0]:Skipping sample id=2730264. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2499287. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2717475. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2717777. Maximum sequence length: 2049, sample length: 2774 [default0]:Skipping sample id=2734964. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2733660. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2754920. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2724968. Maximum sequence length: 2049, sample length: 5529 [default0]:Skipping sample id=2711013. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2728027. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2743585. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2492828. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2718849. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2721508. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2741849. Maximum sequence length: 2049, sample length: 4546 [default0]:Skipping sample id=2744754. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2712458. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2747776. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2744214. Maximum sequence length: 2049, sample length: 2574 [default0]:Skipping sample id=2746565. Maximum sequence length: 2049, sample length: 3125 [default0]:Skipping sample id=2754858. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2724958. Maximum sequence length: 2049, sample length: 2616 [default0]:Skipping sample id=2747117. Maximum sequence length: 2049, sample length: 2903 [default0]:Skipping sample id=2716034. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2719885. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2713609. Maximum sequence length: 2049, sample length: 3355 [default0]:Skipping sample id=2743568. Maximum sequence length: 2049, sample length: 4277 [default0]:Skipping sample id=2734388. Maximum sequence length: 2049, sample length: 3615 [default0]:Skipping sample id=2489004. Maximum sequence length: 2049, sample length: 2870 [default0]:Skipping sample id=2718693. Maximum sequence length: 2049, sample length: 4201 [default0]:Skipping sample id=2742715. Maximum sequence length: 2049, sample length: 5133 [default0]:Skipping sample id=2471308. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2482638. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2721378. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2742176. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2723561. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2488727. Maximum sequence length: 2049, sample length: 2601 [default0]:Skipping sample id=2731449. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2749204. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2748692. Maximum sequence length: 2049, sample length: 3942 [default0]:Skipping sample id=2484063. Maximum sequence length: 2049, sample length: 2697 [default0]:Skipping sample id=2733839. Maximum sequence length: 2049, sample length: 6318 [default0]:Skipping sample id=2737940. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2468312. Maximum sequence length: 2049, sample length: 2649 [default0]:Skipping sample id=2734330. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2728497. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2722636. Maximum sequence length: 2049, sample length: 4492 [default0]:Skipping sample id=2714873. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2738451. Maximum sequence length: 2049, sample length: 4017 [default0]:Skipping sample id=2742764. Maximum sequence length: 2049, sample length: 3127 [default0]:Skipping sample id=2731418. Maximum sequence length: 2049, sample length: 4258 [default0]:Skipping sample id=2756779. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2728130. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2716129. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2734212. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2715797. Maximum sequence length: 2049, sample length: 3717 [default0]:Skipping sample id=2477215. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2731989. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2745714. Maximum sequence length: 2049, sample length: 4331 [default0]:Skipping sample id=2717884. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2716531. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2752734. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2753837. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2719793. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2739932. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2738609. Maximum sequence length: 2049, sample length: 3603 [default0]:Skipping sample id=2728379. Maximum sequence length: 2049, sample length: 5371 [default0]:Skipping sample id=2467300. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2740037. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2718492. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2717332. Maximum sequence length: 2049, sample length: 5943 [default0]:Skipping sample id=2719250. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2730509. Maximum sequence length: 2049, sample length: 3516 [default0]:Skipping sample id=2715751. Maximum sequence length: 2049, sample length: 4017 [default0]:Skipping sample id=2752015. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2749006. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2750981. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2716845. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2735461. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2755350. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2744134. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2734995. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2750083. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2729480. Maximum sequence length: 2049, sample length: 3380 [default0]:Skipping sample id=2499263. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2749073. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2717677. Maximum sequence length: 2049, sample length: 2434 [default0]:Skipping sample id=2721816. Maximum sequence length: 2049, sample length: 4725 [default0]:Skipping sample id=2750510. Maximum sequence length: 2049, sample length: 4740 [default0]:Skipping sample id=2478399. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2746445. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2720548. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2736900. Maximum sequence length: 2049, sample length: 4859 [default0]:Skipping sample id=2712680. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2748004. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2756923. Maximum sequence length: 2049, sample length: 3630 [default0]:Skipping sample id=2731992. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2730457. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2721305. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2734042. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2718318. Maximum sequence length: 2049, sample length: 6491 [default0]:Skipping sample id=2726446. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2722047. Maximum sequence length: 2049, sample length: 3642 [default0]:Skipping sample id=2714382. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2740230. Maximum sequence length: 2049, sample length: 2967 [default0]:Skipping sample id=2742986. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2729657. Maximum sequence length: 2049, sample length: 2553 [default0]:Skipping sample id=2720718. Maximum sequence length: 2049, sample length: 4736 [default0]:Skipping sample id=2490049. Maximum sequence length: 2049, sample length: 3230 [default0]:Skipping sample id=2720662. Maximum sequence length: 2049, sample length: 5146 [default0]:Skipping sample id=2739373. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2711340. Maximum sequence length: 2049, sample length: 6661 [default0]:Skipping sample id=2748361. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2756386. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2749628. Maximum sequence length: 2049, sample length: 3646 [default0]:Skipping sample id=2745236. Maximum sequence length: 2049, sample length: 5050 [default0]:Skipping sample id=2730464. Maximum sequence length: 2049, sample length: 7775 [default0]:Skipping sample id=2742994. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2741676. Maximum sequence length: 2049, sample length: 4133 [default0]:Skipping sample id=2497900. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2718388. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2744209. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2725431. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2724600. Maximum sequence length: 2049, sample length: 3624 [default0]:Skipping sample id=2754754. Maximum sequence length: 2049, sample length: 4445 [default0]:Skipping sample id=2718787. Maximum sequence length: 2049, sample length: 3497 [default0]:Skipping sample id=2723987. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2733210. Maximum sequence length: 2049, sample length: 4242 [default0]:Skipping sample id=2483240. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2749130. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2727856. Maximum sequence length: 2049, sample length: 4762 [default0]:Skipping sample id=2755104. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2735839. Maximum sequence length: 2049, sample length: 5665 [default0]:Skipping sample id=2724477. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2732447. Maximum sequence length: 2049, sample length: 3747 [default0]:Skipping sample id=2742056. Maximum sequence length: 2049, sample length: 2567 [default0]:Skipping sample id=2716352. Maximum sequence length: 2049, sample length: 2756 [default0]:Skipping sample id=2732590. Maximum sequence length: 2049, sample length: 3740 [default0]:Skipping sample id=2724002. Maximum sequence length: 2049, sample length: 3332 [default0]:Skipping sample id=2726806. Maximum sequence length: 2049, sample length: 6674 [default0]:Skipping sample id=2714654. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2731457. Maximum sequence length: 2049, sample length: 3256 [default0]:Skipping sample id=2734774. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2742104. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2493488. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2742921. Maximum sequence length: 2049, sample length: 3428 [default0]:Skipping sample id=2715060. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2721915. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2711942. Maximum sequence length: 2049, sample length: 3186 [default0]:Skipping sample id=2728303. Maximum sequence length: 2049, sample length: 3146 [default0]:Skipping sample id=2722988. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2743169. Maximum sequence length: 2049, sample length: 3040 [default0]:Skipping sample id=2727196. Maximum sequence length: 2049, sample length: 3615 [default0]:Skipping sample id=2483378. Maximum sequence length: 2049, sample length: 3164 [default0]:Skipping sample id=2470276. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2731561. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2713583. Maximum sequence length: 2049, sample length: 3110 [default0]:Skipping sample id=2720074. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2491216. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2748633. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2751193. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2717467. Maximum sequence length: 2049, sample length: 5747 [default0]:Skipping sample id=2717042. Maximum sequence length: 2049, sample length: 3025 [default0]:Skipping sample id=2756960. Maximum sequence length: 2049, sample length: 4391 [default0]:Skipping sample id=2753613. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2722937. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2744817. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2711617. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2733171. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2724238. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2737485. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2715843. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2717373. Maximum sequence length: 2049, sample length: 2998 [default0]:Skipping sample id=2717220. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2749308. Maximum sequence length: 2049, sample length: 3443 [default0]:Skipping sample id=2724170. Maximum sequence length: 2049, sample length: 2234 [default0]:Skipping sample id=2715339. Maximum sequence length: 2049, sample length: 3047 [default0]:Skipping sample id=2753198. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2731428. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2755382. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2742180. Maximum sequence length: 2049, sample length: 3964 [default0]:Skipping sample id=2727742. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2713186. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2730009. Maximum sequence length: 2049, sample length: 4523 [default0]:Skipping sample id=2718652. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2496540. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2716062. Maximum sequence length: 2049, sample length: 3191 [default0]:Skipping sample id=2723591. Maximum sequence length: 2049, sample length: 3988 [default0]:Skipping sample id=2723001. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2751213. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2754046. Maximum sequence length: 2049, sample length: 3306 [default0]:Skipping sample id=2732804. Maximum sequence length: 2049, sample length: 3024 [default0]:Skipping sample id=2726478. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2733493. Maximum sequence length: 2049, sample length: 3269 [default0]:Skipping sample id=2753044. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2755649. Maximum sequence length: 2049, sample length: 6485 [default0]:Skipping sample id=2720873. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2740169. Maximum sequence length: 2049, sample length: 3492 [default0]:Skipping sample id=2713860. Maximum sequence length: 2049, sample length: 5373 [default0]:Skipping sample id=2743689. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2732574. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2718190. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2744139. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2726240. Maximum sequence length: 2049, sample length: 2879 [default0]:Skipping sample id=2480033. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2494363. Maximum sequence length: 2049, sample length: 3246 [default0]:Skipping sample id=2712008. Maximum sequence length: 2049, sample length: 4571 [default0]:Skipping sample id=2716859. Maximum sequence length: 2049, sample length: 5312 [default0]:Skipping sample id=2742263. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2728737. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2725371. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2726393. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2750556. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2731776. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2737650. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2750991. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2732370. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2719590. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2468697. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2734238. Maximum sequence length: 2049, sample length: 4690 [default0]:Skipping sample id=2489022. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2489890. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2723624. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2469398. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2498895. Maximum sequence length: 2049, sample length: 2639 [default0]:Skipping sample id=2731471. Maximum sequence length: 2049, sample length: 2939 [default0]:Skipping sample id=2730806. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2726184. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2494644. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2731374. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2743911. Maximum sequence length: 2049, sample length: 2574 [default0]:Skipping sample id=2729650. Maximum sequence length: 2049, sample length: 6629 [default0]:Skipping sample id=2482247. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2748075. Maximum sequence length: 2049, sample length: 2965 [default0]:Skipping sample id=2466064. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2737869. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2731869. Maximum sequence length: 2049, sample length: 3858 [default0]:Skipping sample id=2487344. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2738577. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2728953. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2717116. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2734508. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2724330. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2734456. Maximum sequence length: 2049, sample length: 2573 [default0]:Skipping sample id=2713443. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2727396. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2725521. Maximum sequence length: 2049, sample length: 3646 [default0]:Skipping sample id=2483865. Maximum sequence length: 2049, sample length: 2656 [default0]:Skipping sample id=2731952. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2731933. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2717971. Maximum sequence length: 2049, sample length: 4910 [default0]:Skipping sample id=2466757. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2714071. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2728214. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2723141. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2729936. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2725195. Maximum sequence length: 2049, sample length: 3731 [default0]:Skipping sample id=2756154. Maximum sequence length: 2049, sample length: 3445 [default0]:Skipping sample id=2711101. Maximum sequence length: 2049, sample length: 4862 [default0]:Skipping sample id=2496616. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2751892. Maximum sequence length: 2049, sample length: 2778 [default0]:Skipping sample id=2733597. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2742566. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2724383. Maximum sequence length: 2049, sample length: 4607 [default0]:Skipping sample id=2736382. Maximum sequence length: 2049, sample length: 3202 [default0]:Skipping sample id=2747550. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2734062. Maximum sequence length: 2049, sample length: 3027 [default0]:Skipping sample id=2730606. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2721756. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2724470. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2732032. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2722901. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2753061. Maximum sequence length: 2049, sample length: 3332 [default0]:Skipping sample id=2752940. Maximum sequence length: 2049, sample length: 4373 [default0]:Skipping sample id=2720613. Maximum sequence length: 2049, sample length: 5393 [default0]:Skipping sample id=2484553. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2748022. Maximum sequence length: 2049, sample length: 2669 [default0]:Skipping sample id=2750623. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2736854. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2719922. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2712549. Maximum sequence length: 2049, sample length: 4764 [default0]:Skipping sample id=2713337. Maximum sequence length: 2049, sample length: 5033 [default0]:Skipping sample id=2717446. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2489693. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2731973. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2737838. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2722940. Maximum sequence length: 2049, sample length: 2602 [default0]:Skipping sample id=2744634. Maximum sequence length: 2049, sample length: 5104 [default0]:Skipping sample id=2726451. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2752362. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2712564. Maximum sequence length: 2049, sample length: 3174 [default0]:Skipping sample id=2719727. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2738982. Maximum sequence length: 2049, sample length: 4742 [default0]:Skipping sample id=2756504. Maximum sequence length: 2049, sample length: 4335 [default0]:Skipping sample id=2735919. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2729965. Maximum sequence length: 2049, sample length: 3768 [default0]:Skipping sample id=2752945. Maximum sequence length: 2049, sample length: 2774 [default0]:Skipping sample id=2723458. Maximum sequence length: 2049, sample length: 3072 [default0]:Skipping sample id=2721207. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2727949. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2719446. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2731310. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2496963. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2738710. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2753297. Maximum sequence length: 2049, sample length: 3368 [default0]:Skipping sample id=2740171. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2711325. Maximum sequence length: 2049, sample length: 4438 [default0]:Skipping sample id=2742014. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2756694. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2730815. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2749909. Maximum sequence length: 2049, sample length: 4010 [default0]:Skipping sample id=2739995. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2721198. Maximum sequence length: 2049, sample length: 5057 [default0]:Skipping sample id=2727404. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2734960. Maximum sequence length: 2049, sample length: 3011 [default0]:Skipping sample id=2713281. Maximum sequence length: 2049, sample length: 3913 [default0]:Skipping sample id=2483887. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2750944. Maximum sequence length: 2049, sample length: 5017 [default0]:Skipping sample id=2734830. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2755483. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2722222. Maximum sequence length: 2049, sample length: 4167 [default0]:Skipping sample id=2716794. Maximum sequence length: 2049, sample length: 2954 [default0]:Skipping sample id=2756072. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2721850. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2713530. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2734954. Maximum sequence length: 2049, sample length: 3792 [default0]:Skipping sample id=2730706. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2730928. Maximum sequence length: 2049, sample length: 3308 [default0]:Skipping sample id=2742847. Maximum sequence length: 2049, sample length: 3132 [default0]:Skipping sample id=2717197. Maximum sequence length: 2049, sample length: 4048 [default0]:Skipping sample id=2477916. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2730148. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2481790. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2754395. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2723304. Maximum sequence length: 2049, sample length: 4095 [default0]:Skipping sample id=2724452. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2750687. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2477335. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2727882. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2722534. Maximum sequence length: 2049, sample length: 2968 [default0]:Skipping sample id=2743513. Maximum sequence length: 2049, sample length: 6004 [default0]:Skipping sample id=2721542. Maximum sequence length: 2049, sample length: 5113 [default0]:Skipping sample id=2715242. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2727484. Maximum sequence length: 2049, sample length: 4058 [default0]:Skipping sample id=2721682. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2499171. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2734635. Maximum sequence length: 2049, sample length: 8224 [default0]:Skipping sample id=2717556. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2720538. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2732224. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2739469. Maximum sequence length: 2049, sample length: 2697 [default0]:Skipping sample id=2712811. Maximum sequence length: 2049, sample length: 2875 [default0]:Skipping sample id=2713912. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2756426. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2732228. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2742410. Maximum sequence length: 2049, sample length: 3375 [default0]:Skipping sample id=2750245. Maximum sequence length: 2049, sample length: 3407 [default0]:Skipping sample id=2740239. Maximum sequence length: 2049, sample length: 3029 [default0]:Skipping sample id=2751746. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2740502. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2718033. Maximum sequence length: 2049, sample length: 4133 [default0]:Skipping sample id=2738560. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2719422. Maximum sequence length: 2049, sample length: 2869 [default0]:Skipping sample id=2744298. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2737654. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2735306. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2736265. Maximum sequence length: 2049, sample length: 3385 [default0]:Skipping sample id=2728582. Maximum sequence length: 2049, sample length: 4618 [default0]:Skipping sample id=2754102. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2483802. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2749232. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2717286. Maximum sequence length: 2049, sample length: 3875 [default0]:Skipping sample id=2731775. Maximum sequence length: 2049, sample length: 4401 [default0]:Skipping sample id=2468032. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2735139. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2722865. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2482653. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2491248. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2487571. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2738694. Maximum sequence length: 2049, sample length: 3866 [default0]:Skipping sample id=2469471. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2715037. Maximum sequence length: 2049, sample length: 3040 [default0]:Skipping sample id=2755291. Maximum sequence length: 2049, sample length: 2625 [default0]:Skipping sample id=2716103. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2721727. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2740257. Maximum sequence length: 2049, sample length: 3295 [default0]:Skipping sample id=2731940. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2741064. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2742685. Maximum sequence length: 2049, sample length: 4607 [default0]:Skipping sample id=2489480. Maximum sequence length: 2049, sample length: 2575 [default0]:Skipping sample id=2735501. Maximum sequence length: 2049, sample length: 3437 [default0]:Skipping sample id=2728181. Maximum sequence length: 2049, sample length: 4532 [default0]:Skipping sample id=2721875. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2736936. Maximum sequence length: 2049, sample length: 3047 [default0]:Skipping sample id=2752747. Maximum sequence length: 2049, sample length: 3115 [default0]:Skipping sample id=2726714. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2744395. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2713846. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2721916. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2746885. Maximum sequence length: 2049, sample length: 3323 [default0]:Skipping sample id=2478611. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2735220. Maximum sequence length: 2049, sample length: 3945 [default0]:Skipping sample id=2715923. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2749683. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2725737. Maximum sequence length: 2049, sample length: 3472 [default0]:Skipping sample id=2728473. Maximum sequence length: 2049, sample length: 3671 [default0]:Skipping sample id=2724443. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2737870. Maximum sequence length: 2049, sample length: 2824 [default0]:Skipping sample id=2713751. Maximum sequence length: 2049, sample length: 5688 [default0]:Skipping sample id=2711282. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2737399. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2737865. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2711299. Maximum sequence length: 2049, sample length: 2968 [default0]:Skipping sample id=2742882. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2749040. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2751336. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2723631. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2751530. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2494677. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2751759. Maximum sequence length: 2049, sample length: 3334 [default0]:Skipping sample id=2742782. Maximum sequence length: 2049, sample length: 4437 [default0]:Skipping sample id=2718449. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2493305. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2720069. Maximum sequence length: 2049, sample length: 4535 [default0]:Skipping sample id=2719447. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2742398. Maximum sequence length: 2049, sample length: 3432 [default0]:Skipping sample id=2497197. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2712181. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2736901. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2732009. Maximum sequence length: 2049, sample length: 4268 [default0]:Skipping sample id=2756761. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2713234. Maximum sequence length: 2049, sample length: 4865 [default0]:Skipping sample id=2729061. Maximum sequence length: 2049, sample length: 6800 [default0]:Skipping sample id=2735693. Maximum sequence length: 2049, sample length: 2440 [default0]:Skipping sample id=2499113. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2721027. Maximum sequence length: 2049, sample length: 4539 [default0]:Skipping sample id=2713934. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2714690. Maximum sequence length: 2049, sample length: 3868 [default0]:Skipping sample id=2720474. Maximum sequence length: 2049, sample length: 2971 [default0]:Skipping sample id=2729616. Maximum sequence length: 2049, sample length: 4003 [default0]:Skipping sample id=2747048. Maximum sequence length: 2049, sample length: 5053 [default0]:Skipping sample id=2493400. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2716256. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2466287. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2735533. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2487019. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2497761. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2728864. Maximum sequence length: 2049, sample length: 3837 [default0]:Skipping sample id=2489233. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2719253. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2746254. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2740119. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2716454. Maximum sequence length: 2049, sample length: 4438 [default0]:Skipping sample id=2478536. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2740529. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2743077. Maximum sequence length: 2049, sample length: 6761 [default0]:Skipping sample id=2738411. Maximum sequence length: 2049, sample length: 2575 [default0]:Skipping sample id=2498588. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2733432. Maximum sequence length: 2049, sample length: 3146 [default0]:Skipping sample id=2488206. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2470949. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2716201. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2717230. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2471123. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2717049. Maximum sequence length: 2049, sample length: 2621 [default0]:Skipping sample id=2489386. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2732897. Maximum sequence length: 2049, sample length: 5322 [default0]:Skipping sample id=2726176. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2493750. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2720852. Maximum sequence length: 2049, sample length: 8161 [default0]:Skipping sample id=2484291. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2737921. Maximum sequence length: 2049, sample length: 2780 [default0]:Skipping sample id=2712419. Maximum sequence length: 2049, sample length: 3643 [default0]:Skipping sample id=2465995. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2743835. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2720513. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2717326. Maximum sequence length: 2049, sample length: 4155 [default0]:Skipping sample id=2748484. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2717742. Maximum sequence length: 2049, sample length: 4563 [default0]:Skipping sample id=2732918. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2717665. Maximum sequence length: 2049, sample length: 2695 [default0]:Skipping sample id=2467819. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2471010. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2483086. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2742177. Maximum sequence length: 2049, sample length: 4363 [default0]:Skipping sample id=2730330. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2712401. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2735585. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2749932. Maximum sequence length: 2049, sample length: 3098 [default0]:Skipping sample id=2720736. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2752199. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2731270. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2748253. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2735012. Maximum sequence length: 2049, sample length: 3368 [default0]:Skipping sample id=2469951. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2748365. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2721645. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2466081. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2723791. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2726173. Maximum sequence length: 2049, sample length: 4080 [default0]:Skipping sample id=2469386. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2740686. Maximum sequence length: 2049, sample length: 5009 [default0]:Skipping sample id=2746708. Maximum sequence length: 2049, sample length: 5235 [default0]:Skipping sample id=2741846. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2716516. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2736764. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2717940. Maximum sequence length: 2049, sample length: 2875 [default0]:Skipping sample id=2742290. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2754022. Maximum sequence length: 2049, sample length: 2968 [default0]:Skipping sample id=2750552. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2718458. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2716532. Maximum sequence length: 2049, sample length: 4430 [default0]:Skipping sample id=2746630. Maximum sequence length: 2049, sample length: 3327 [default0]:Skipping sample id=2740636. Maximum sequence length: 2049, sample length: 3075 [default0]:Skipping sample id=2733202. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2477910. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2753950. Maximum sequence length: 2049, sample length: 4776 [default0]:Skipping sample id=2755543. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2730168. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2722507. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2712090. Maximum sequence length: 2049, sample length: 2919 [default0]:Skipping sample id=2743669. Maximum sequence length: 2049, sample length: 2876 [default0]:Skipping sample id=2749018. Maximum sequence length: 2049, sample length: 3517 [default0]:Skipping sample id=2751439. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2738327. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2713255. Maximum sequence length: 2049, sample length: 3191 [default0]:Skipping sample id=2750028. Maximum sequence length: 2049, sample length: 4528 [default0]:Skipping sample id=2721700. Maximum sequence length: 2049, sample length: 3086 [default0]:Skipping sample id=2734800. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2713550. Maximum sequence length: 2049, sample length: 2457 [default0]:Skipping sample id=2722071. Maximum sequence length: 2049, sample length: 7290 [default0]:Skipping sample id=2489037. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2750681. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2726415. Maximum sequence length: 2049, sample length: 3013 [default0]:Skipping sample id=2752404. Maximum sequence length: 2049, sample length: 4627 [default0]:Skipping sample id=2741559. Maximum sequence length: 2049, sample length: 3626 [default0]:Skipping sample id=2741461. Maximum sequence length: 2049, sample length: 2944 [default0]:Skipping sample id=2495833. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2750779. Maximum sequence length: 2049, sample length: 4250 [default0]:Skipping sample id=2721048. Maximum sequence length: 2049, sample length: 2950 [default0]:Skipping sample id=2711068. Maximum sequence length: 2049, sample length: 4041 [default0]:Skipping sample id=2712906. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2723883. Maximum sequence length: 2049, sample length: 3912 [default0]:Skipping sample id=2716991. Maximum sequence length: 2049, sample length: 3613 [default0]:Skipping sample id=2744125. Maximum sequence length: 2049, sample length: 3576 [default0]:Skipping sample id=2755703. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2728106. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2753343. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2749843. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2746421. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2726266. Maximum sequence length: 2049, sample length: 2947 [default0]:Skipping sample id=2713566. Maximum sequence length: 2049, sample length: 2989 [default0]:Skipping sample id=2738122. Maximum sequence length: 2049, sample length: 2490 [default0]:Skipping sample id=2736986. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2721318. Maximum sequence length: 2049, sample length: 4830 [default0]:Skipping sample id=2747990. Maximum sequence length: 2049, sample length: 4223 [default0]:Skipping sample id=2750848. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2736697. Maximum sequence length: 2049, sample length: 2993 [default0]:Skipping sample id=2757021. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2736848. Maximum sequence length: 2049, sample length: 3245 [default0]:Skipping sample id=2727421. Maximum sequence length: 2049, sample length: 7261 [default0]:Skipping sample id=2491916. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2494121. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2751972. Maximum sequence length: 2049, sample length: 4814 [default0]:Skipping sample id=2727701. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2711466. Maximum sequence length: 2049, sample length: 5789 [default0]:Skipping sample id=2723368. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2737353. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2478582. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2725731. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2477227. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2732645. Maximum sequence length: 2049, sample length: 2683 [default0]:Skipping sample id=2711657. Maximum sequence length: 2049, sample length: 2886 [default0]:Skipping sample id=2739695. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2491768. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2730410. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2722886. Maximum sequence length: 2049, sample length: 5364 [default0]:Skipping sample id=2494017. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2727733. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2730973. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2715868. Maximum sequence length: 2049, sample length: 3098 [default0]:Skipping sample id=2734295. Maximum sequence length: 2049, sample length: 5111 [default0]:Skipping sample id=2719591. Maximum sequence length: 2049, sample length: 5165 [default0]:Skipping sample id=2715693. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2745177. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2735981. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2728284. Maximum sequence length: 2049, sample length: 4239 [default0]:Skipping sample id=2713366. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2756834. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2716883. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2724423. Maximum sequence length: 2049, sample length: 5329 [default0]:Skipping sample id=2721053. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2740615. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2748177. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2733404. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2730444. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2743936. Maximum sequence length: 2049, sample length: 2470 [default0]:Skipping sample id=2745524. Maximum sequence length: 2049, sample length: 3639 [default0]:Skipping sample id=2732540. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2727681. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2733463. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2734285. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2712205. Maximum sequence length: 2049, sample length: 4749 [default0]:Skipping sample id=2721271. Maximum sequence length: 2049, sample length: 2750 [default0]:Skipping sample id=2745294. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2717933. Maximum sequence length: 2049, sample length: 3218 [default0]:Skipping sample id=2481519. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2750283. Maximum sequence length: 2049, sample length: 2421 [default0]:Skipping sample id=2739245. Maximum sequence length: 2049, sample length: 4545 [default0]:Skipping sample id=2751906. Maximum sequence length: 2049, sample length: 3157 [default0]:Skipping sample id=2715290. Maximum sequence length: 2049, sample length: 2749 [default0]:Skipping sample id=2727382. Maximum sequence length: 2049, sample length: 3664 [default0]:Skipping sample id=2487432. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2720243. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2721083. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2720670. Maximum sequence length: 2049, sample length: 3318 [default0]:Skipping sample id=2731180. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2480524. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2726717. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2730995. Maximum sequence length: 2049, sample length: 3550 [default0]:Skipping sample id=2712804. Maximum sequence length: 2049, sample length: 3218 [default0]:Skipping sample id=2732712. Maximum sequence length: 2049, sample length: 3061 [default0]:Skipping sample id=2752355. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2727651. Maximum sequence length: 2049, sample length: 3939 [default0]:Skipping sample id=2749768. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2744445. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2747975. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2741325. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2712021. Maximum sequence length: 2049, sample length: 3709 [default0]:Skipping sample id=2718785. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2752365. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2734904. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2470456. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2736505. Maximum sequence length: 2049, sample length: 2881 [default0]:Skipping sample id=2745468. Maximum sequence length: 2049, sample length: 3204 [default0]:Skipping sample id=2725943. Maximum sequence length: 2049, sample length: 6352 [default0]:Skipping sample id=2743662. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2745553. Maximum sequence length: 2049, sample length: 4535 [default0]:Skipping sample id=2746221. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2483696. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2737697. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2750800. Maximum sequence length: 2049, sample length: 3923 [default0]:Skipping sample id=2484061. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2483147. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2719707. Maximum sequence length: 2049, sample length: 3778 [default0]:Skipping sample id=2471178. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2740291. Maximum sequence length: 2049, sample length: 3895 [default0]:Skipping sample id=2754913. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2742816. Maximum sequence length: 2049, sample length: 3691 [default0]:Skipping sample id=2711152. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2744862. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2482259. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2724489. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2729016. Maximum sequence length: 2049, sample length: 4492 [default0]:Skipping sample id=2731004. Maximum sequence length: 2049, sample length: 2642 [default0]:Skipping sample id=2739658. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2728901. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2754557. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2750098. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2734255. Maximum sequence length: 2049, sample length: 4117 [default0]:Skipping sample id=2732810. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2746282. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2731522. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2714387. Maximum sequence length: 2049, sample length: 2903 [default0]:Skipping sample id=2736926. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2753440. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2749295. Maximum sequence length: 2049, sample length: 3201 [default0]:Skipping sample id=2490469. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2485014. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2753237. Maximum sequence length: 2049, sample length: 4599 [default0]:Skipping sample id=2712569. Maximum sequence length: 2049, sample length: 3601 [default0]:Skipping sample id=2723064. Maximum sequence length: 2049, sample length: 4129 [default0]:Skipping sample id=2734734. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2713519. Maximum sequence length: 2049, sample length: 3996 [default0]:Skipping sample id=2726853. Maximum sequence length: 2049, sample length: 5779 [default0]:Skipping sample id=2756081. Maximum sequence length: 2049, sample length: 3661 [default0]:Skipping sample id=2754983. Maximum sequence length: 2049, sample length: 4426 [default0]:Skipping sample id=2751987. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2750843. Maximum sequence length: 2049, sample length: 4338 [default0]:Skipping sample id=2715412. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2478537. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2726858. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2716641. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2732746. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2748687. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2711778. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2724778. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2757109. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2735473. Maximum sequence length: 2049, sample length: 3811 [default0]:Skipping sample id=2755405. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2732993. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2469977. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2735403. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2717840. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2481624. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2729674. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2494392. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2741212. Maximum sequence length: 2049, sample length: 2959 [default0]:Skipping sample id=2717895. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2749632. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2754530. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2756922. Maximum sequence length: 2049, sample length: 3372 [default0]:Skipping sample id=2742578. Maximum sequence length: 2049, sample length: 4034 [default0]:Skipping sample id=2735856. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2478826. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2756900. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2731763. Maximum sequence length: 2049, sample length: 4817 [default0]:Skipping sample id=2732802. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2741005. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2737211. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2736545. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2742589. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2723808. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2478851. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2714989. Maximum sequence length: 2049, sample length: 3747 [default0]:Skipping sample id=2751306. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2721328. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2480597. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2752536. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2724870. Maximum sequence length: 2049, sample length: 3407 [default0]:Skipping sample id=2731964. Maximum sequence length: 2049, sample length: 2507 [default0]:Skipping sample id=2711331. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2713133. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2740654. Maximum sequence length: 2049, sample length: 3075 [default0]:Skipping sample id=2483738. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2725477. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2744399. Maximum sequence length: 2049, sample length: 3719 [default0]:Skipping sample id=2717454. Maximum sequence length: 2049, sample length: 3801 [default0]:Skipping sample id=2752498. Maximum sequence length: 2049, sample length: 5527 [default0]:Skipping sample id=2745614. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2485259. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2727349. Maximum sequence length: 2049, sample length: 3408 [default0]:Skipping sample id=2729321. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2736557. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2744609. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2744000. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2723852. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2720913. Maximum sequence length: 2049, sample length: 4496 [default0]:Skipping sample id=2724910. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2714444. Maximum sequence length: 2049, sample length: 3021 [default0]:Skipping sample id=2469187. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2714218. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2738331. Maximum sequence length: 2049, sample length: 2669 [default0]:Skipping sample id=2754555. Maximum sequence length: 2049, sample length: 2553 [default0]:Skipping sample id=2716765. Maximum sequence length: 2049, sample length: 3566 [default0]:Skipping sample id=2481034. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2747243. Maximum sequence length: 2049, sample length: 6495 [default0]:Skipping sample id=2737860. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2721301. Maximum sequence length: 2049, sample length: 4252 [default0]:Skipping sample id=2716038. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2747122. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2728710. Maximum sequence length: 2049, sample length: 6318 [default0]:Skipping sample id=2720302. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2749560. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2744072. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2757020. Maximum sequence length: 2049, sample length: 2489 [default0]:Skipping sample id=2499209. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2730522. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2743197. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2755632. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2716577. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2722241. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2742572. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2742019. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2732524. Maximum sequence length: 2049, sample length: 4561 [default0]:Skipping sample id=2740780. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2720961. Maximum sequence length: 2049, sample length: 3019 [default0]:Skipping sample id=2747992. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2466008. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2495121. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2730359. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2729261. Maximum sequence length: 2049, sample length: 4635 [default0]:Skipping sample id=2479258. Maximum sequence length: 2049, sample length: 2835 [default0]:Skipping sample id=2485746. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2723867. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2712193. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2725377. Maximum sequence length: 2049, sample length: 5617 [default0]:Skipping sample id=2724316. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2747409. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2721274. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2733227. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2732941. Maximum sequence length: 2049, sample length: 3159 [default0]:Skipping sample id=2752082. Maximum sequence length: 2049, sample length: 2714 [default0]:Skipping sample id=2752638. Maximum sequence length: 2049, sample length: 3252 [default0]:Skipping sample id=2740948. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2484561. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2713466. Maximum sequence length: 2049, sample length: 3642 [default0]:Skipping sample id=2753374. Maximum sequence length: 2049, sample length: 2656 [default0]:Skipping sample id=2736939. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2714050. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2726825. Maximum sequence length: 2049, sample length: 3372 [default0]:Skipping sample id=2722859. Maximum sequence length: 2049, sample length: 2697 [default0]:Skipping sample id=2744623. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2756896. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2715946. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2745464. Maximum sequence length: 2049, sample length: 3733 [default0]:Skipping sample id=2724337. Maximum sequence length: 2049, sample length: 3671 [default0]:Skipping sample id=2748032. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2713389. Maximum sequence length: 2049, sample length: 3508 [default0]:Skipping sample id=2742742. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2739943. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2714945. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2494425. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2717934. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2740594. Maximum sequence length: 2049, sample length: 2933 [default0]:Skipping sample id=2728747. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2734141. Maximum sequence length: 2049, sample length: 3895 [default0]:Skipping sample id=2716348. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2741609. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2727018. Maximum sequence length: 2049, sample length: 4178 [default0]:Skipping sample id=2728468. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2755758. Maximum sequence length: 2049, sample length: 4231 [default0]:Skipping sample id=2752409. Maximum sequence length: 2049, sample length: 3400 [default0]:Skipping sample id=2737505. Maximum sequence length: 2049, sample length: 4929 [default0]:Skipping sample id=2718205. Maximum sequence length: 2049, sample length: 4868 [default0]:Skipping sample id=2741588. Maximum sequence length: 2049, sample length: 6238 [default0]:Skipping sample id=2470463. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2712495. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2713298. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2466587. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2755889. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2732727. Maximum sequence length: 2049, sample length: 5336 [default0]:Skipping sample id=2484165. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2721020. Maximum sequence length: 2049, sample length: 3688 [default0]:Skipping sample id=2742772. Maximum sequence length: 2049, sample length: 2531 [default0]:Skipping sample id=2754092. Maximum sequence length: 2049, sample length: 3735 [default0]:Skipping sample id=2744158. Maximum sequence length: 2049, sample length: 4402 [default0]:Skipping sample id=2747061. Maximum sequence length: 2049, sample length: 4706 [default0]:Skipping sample id=2744300. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2729043. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2496268. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2737397. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2750501. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2714434. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2730081. Maximum sequence length: 2049, sample length: 3293 [default0]:Skipping sample id=2713914. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2732461. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2721240. Maximum sequence length: 2049, sample length: 3410 [default0]:Skipping sample id=2718569. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2721423. Maximum sequence length: 2049, sample length: 5833 [default0]:Skipping sample id=2718760. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2720555. Maximum sequence length: 2049, sample length: 4157 [default0]:Skipping sample id=2735539. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2745475. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2735036. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2729129. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2712883. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2720169. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2739764. Maximum sequence length: 2049, sample length: 8121 [default0]:Skipping sample id=2730698. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2729864. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2488509. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2729105. Maximum sequence length: 2049, sample length: 4603 [default0]:Skipping sample id=2718466. Maximum sequence length: 2049, sample length: 5033 [default0]:Skipping sample id=2723887. Maximum sequence length: 2049, sample length: 4509 [default0]:Skipping sample id=2729404. Maximum sequence length: 2049, sample length: 3254 [default0]:Skipping sample id=2730325. Maximum sequence length: 2049, sample length: 2910 [default0]:Skipping sample id=2749812. Maximum sequence length: 2049, sample length: 3463 [default0]:Skipping sample id=2728248. Maximum sequence length: 2049, sample length: 3127 [default0]:Skipping sample id=2734579. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2744467. Maximum sequence length: 2049, sample length: 4799 [default0]:Skipping sample id=2721753. Maximum sequence length: 2049, sample length: 4111 [default0]:Skipping sample id=2752375. Maximum sequence length: 2049, sample length: 3630 [default0]:Skipping sample id=2734647. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2719399. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2718988. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2753762. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2724051. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2716728. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2727487. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2739712. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2487692. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2747609. Maximum sequence length: 2049, sample length: 3167 [default0]:Skipping sample id=2742999. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2737583. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2737666. Maximum sequence length: 2049, sample length: 3546 [default0]:Skipping sample id=2729802. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2729109. Maximum sequence length: 2049, sample length: 4691 [default0]:Skipping sample id=2747422. Maximum sequence length: 2049, sample length: 3367 [default0]:Skipping sample id=2711665. Maximum sequence length: 2049, sample length: 2954 [default0]:Skipping sample id=2728208. Maximum sequence length: 2049, sample length: 4004 [default0]:Skipping sample id=2752970. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2750295. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2753361. Maximum sequence length: 2049, sample length: 2456 [default0]:Skipping sample id=2718695. Maximum sequence length: 2049, sample length: 2989 [default0]:Skipping sample id=2738706. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2487927. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2749280. Maximum sequence length: 2049, sample length: 3918 [default0]:Skipping sample id=2715292. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2728011. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2489463. Maximum sequence length: 2049, sample length: 2857 [default0]:Skipping sample id=2734936. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2723923. Maximum sequence length: 2049, sample length: 3643 [default0]:Skipping sample id=2493828. Maximum sequence length: 2049, sample length: 3113 [default0]:Skipping sample id=2480641. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2486045. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2724673. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2732482. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2723503. Maximum sequence length: 2049, sample length: 3541 [default0]:Skipping sample id=2742117. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2753079. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2736581. Maximum sequence length: 2049, sample length: 5977 [default0]:Skipping sample id=2728031. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2713642. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2754904. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2467724. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2479201. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2724130. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2726909. Maximum sequence length: 2049, sample length: 5184 [default0]:Skipping sample id=2737607. Maximum sequence length: 2049, sample length: 3156 [default0]:Skipping sample id=2731452. Maximum sequence length: 2049, sample length: 5978 [default0]:Skipping sample id=2731847. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2729639. Maximum sequence length: 2049, sample length: 4527 [default0]:Skipping sample id=2731846. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2739429. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2485896. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2742466. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2736713. Maximum sequence length: 2049, sample length: 3796 [default0]:Skipping sample id=2719913. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2732122. Maximum sequence length: 2049, sample length: 3320 [default0]:Skipping sample id=2719029. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2723162. Maximum sequence length: 2049, sample length: 4140 [default0]:Skipping sample id=2749837. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2741188. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2492344. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2746988. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2753775. Maximum sequence length: 2049, sample length: 2553 [default0]:Skipping sample id=2741032. Maximum sequence length: 2049, sample length: 3231 [default0]:Skipping sample id=2751965. Maximum sequence length: 2049, sample length: 3084 [default0]:Skipping sample id=2741827. Maximum sequence length: 2049, sample length: 3566 [default0]:Skipping sample id=2723632. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2748789. Maximum sequence length: 2049, sample length: 3516 [default0]:Skipping sample id=2727784. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2711376. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2748377. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2740987. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2720118. Maximum sequence length: 2049, sample length: 3747 [default0]:Skipping sample id=2717518. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2743918. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2735756. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2751203. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2723795. Maximum sequence length: 2049, sample length: 2369 [default0]:Skipping sample id=2744301. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2745029. Maximum sequence length: 2049, sample length: 4089 [default0]:Skipping sample id=2720063. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2724644. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2722693. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2489033. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2741153. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2722370. Maximum sequence length: 2049, sample length: 5150 [default0]:Skipping sample id=2731917. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2734522. Maximum sequence length: 2049, sample length: 4138 [default0]:Skipping sample id=2712003. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2746365. Maximum sequence length: 2049, sample length: 4167 [default0]:Skipping sample id=2732568. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2733399. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2747148. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2465937. Maximum sequence length: 2049, sample length: 3600 [default0]:Skipping sample id=2747983. Maximum sequence length: 2049, sample length: 3088 [default0]:Skipping sample id=2716927. Maximum sequence length: 2049, sample length: 2836 [default0]:Skipping sample id=2487639. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2717894. Maximum sequence length: 2049, sample length: 3637 [default0]:Skipping sample id=2724499. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2753876. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2722358. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2719229. Maximum sequence length: 2049, sample length: 3887 [default0]:Skipping sample id=2726789. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2748439. Maximum sequence length: 2049, sample length: 2616 [default0]:Skipping sample id=2482319. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2715156. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2754147. Maximum sequence length: 2049, sample length: 3514 [default0]:Skipping sample id=2747791. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2741901. Maximum sequence length: 2049, sample length: 3186 [default0]:Skipping sample id=2714742. Maximum sequence length: 2049, sample length: 2712 [default0]:Skipping sample id=2716984. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2714153. Maximum sequence length: 2049, sample length: 2946 [default0]:Skipping sample id=2725042. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2466770. Maximum sequence length: 2049, sample length: 3265 [default0]:Skipping sample id=2727721. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2493314. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2477297. Maximum sequence length: 2049, sample length: 2671 [default0]:Skipping sample id=2734992. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2747902. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2737948. Maximum sequence length: 2049, sample length: 6760 [default0]:Skipping sample id=2737306. Maximum sequence length: 2049, sample length: 3220 [default0]:Skipping sample id=2756918. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2729272. Maximum sequence length: 2049, sample length: 4142 [default0]:Skipping sample id=2481109. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2728319. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2732084. Maximum sequence length: 2049, sample length: 3859 [default0]:Skipping sample id=2713495. Maximum sequence length: 2049, sample length: 3759 [default0]:Skipping sample id=2734792. Maximum sequence length: 2049, sample length: 3258 [default0]:Skipping sample id=2469777. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2754661. Maximum sequence length: 2049, sample length: 4381 [default0]:Skipping sample id=2712350. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2737359. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2739537. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2742871. Maximum sequence length: 2049, sample length: 3932 [default0]:Skipping sample id=2721760. Maximum sequence length: 2049, sample length: 2679 [default0]:Skipping sample id=2480274. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2711482. Maximum sequence length: 2049, sample length: 3462 [default0]:Skipping sample id=2727788. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2493516. Maximum sequence length: 2049, sample length: 2904 [default0]:Skipping sample id=2727734. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2722085. Maximum sequence length: 2049, sample length: 3492 [default0]:Skipping sample id=2714536. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2747746. Maximum sequence length: 2049, sample length: 3694 [default0]:Skipping sample id=2720689. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2743975. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2722832. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2740616. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2747632. Maximum sequence length: 2049, sample length: 3433 [default0]:Skipping sample id=2470825. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2713665. Maximum sequence length: 2049, sample length: 3636 [default0]:Skipping sample id=2743080. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2719043. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2737974. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2753851. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2727416. Maximum sequence length: 2049, sample length: 3395 [default0]:Skipping sample id=2750321. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2738311. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2731757. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2721072. Maximum sequence length: 2049, sample length: 6524 [default0]:Skipping sample id=2720831. Maximum sequence length: 2049, sample length: 3559 [default0]:Skipping sample id=2742230. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2739409. Maximum sequence length: 2049, sample length: 5609 [default0]:Skipping sample id=2745250. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2745033. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2731030. Maximum sequence length: 2049, sample length: 3416 [default0]:Skipping sample id=2728490. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2750863. Maximum sequence length: 2049, sample length: 3491 [default0]:Skipping sample id=2728558. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2715878. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2486146. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2483622. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2756460. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2730907. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2713117. Maximum sequence length: 2049, sample length: 6543 [default0]:Skipping sample id=2751166. Maximum sequence length: 2049, sample length: 3525 [default0]:Skipping sample id=2713658. Maximum sequence length: 2049, sample length: 3497 [default0]:Skipping sample id=2753192. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2748133. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2733006. Maximum sequence length: 2049, sample length: 3564 [default0]:Skipping sample id=2736892. Maximum sequence length: 2049, sample length: 4521 [default0]:Skipping sample id=2752419. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2743018. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2739663. Maximum sequence length: 2049, sample length: 3062 [default0]:Skipping sample id=2722584. Maximum sequence length: 2049, sample length: 3210 [default0]:Skipping sample id=2734429. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2745419. Maximum sequence length: 2049, sample length: 2555 [default0]:Skipping sample id=2715637. Maximum sequence length: 2049, sample length: 3173 [default0]:Skipping sample id=2742842. Maximum sequence length: 2049, sample length: 3927 [default0]:Skipping sample id=2712601. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2711291. Maximum sequence length: 2049, sample length: 3280 [default0]:Skipping sample id=2715879. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2751815. Maximum sequence length: 2049, sample length: 2922 [default0]:Skipping sample id=2739555. Maximum sequence length: 2049, sample length: 3999 [default0]:Skipping sample id=2739011. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2750973. Maximum sequence length: 2049, sample length: 2434 [default0]:Skipping sample id=2483202. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2739389. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2748142. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2741368. Maximum sequence length: 2049, sample length: 8032 [default0]:Skipping sample id=2733182. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2484126. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2730084. Maximum sequence length: 2049, sample length: 4560 [default0]:Skipping sample id=2723062. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2718329. Maximum sequence length: 2049, sample length: 3065 [default0]:Skipping sample id=2722324. Maximum sequence length: 2049, sample length: 6445 [default0]:Skipping sample id=2724582. Maximum sequence length: 2049, sample length: 4193 [default0]:Skipping sample id=2732342. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2722771. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2755951. Maximum sequence length: 2049, sample length: 3894 [default0]:Skipping sample id=2716763. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2733710. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2711016. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2724925. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2726710. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2721950. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2731226. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2746476. Maximum sequence length: 2049, sample length: 3366 [default0]:Skipping sample id=2479033. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2483875. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2721674. Maximum sequence length: 2049, sample length: 3792 [default0]:Skipping sample id=2724167. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2489672. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2719921. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2722564. Maximum sequence length: 2049, sample length: 2430 [default0]:Skipping sample id=2726962. Maximum sequence length: 2049, sample length: 2966 [default0]:Skipping sample id=2745731. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2728614. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2726669. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2727817. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2714067. Maximum sequence length: 2049, sample length: 2748 [default0]:Skipping sample id=2495131. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2751187. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2465914. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2482435. Maximum sequence length: 2049, sample length: 2906 [default0]:Skipping sample id=2730161. Maximum sequence length: 2049, sample length: 3418 [default0]:Skipping sample id=2723675. Maximum sequence length: 2049, sample length: 2910 [default0]:Skipping sample id=2755121. Maximum sequence length: 2049, sample length: 5650 [default0]:Skipping sample id=2730077. Maximum sequence length: 2049, sample length: 3858 [default0]:Skipping sample id=2757009. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2719907. Maximum sequence length: 2049, sample length: 3254 [default0]:Skipping sample id=2710987. Maximum sequence length: 2049, sample length: 4320 [default0]:Skipping sample id=2715346. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2718631. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2751867. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2724843. Maximum sequence length: 2049, sample length: 3006 [default0]:Skipping sample id=2729853. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2727810. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2717609. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2754097. Maximum sequence length: 2049, sample length: 2803 [default0]:Skipping sample id=2729862. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2751903. Maximum sequence length: 2049, sample length: 3849 [default0]:Skipping sample id=2736093. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2711862. Maximum sequence length: 2049, sample length: 3613 [default0]:Skipping sample id=2756004. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2745632. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2720035. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2737292. Maximum sequence length: 2049, sample length: 2746 [default0]:Skipping sample id=2737017. Maximum sequence length: 2049, sample length: 4619 [default0]:Skipping sample id=2748089. Maximum sequence length: 2049, sample length: 7077 [default0]:Skipping sample id=2727476. Maximum sequence length: 2049, sample length: 2594 [default0]:Skipping sample id=2711259. Maximum sequence length: 2049, sample length: 4411 [default0]:Skipping sample id=2495899. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2731497. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2752098. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2748700. Maximum sequence length: 2049, sample length: 4774 [default0]:Skipping sample id=2478825. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2713259. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2742698. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2478469. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2483972. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2747826. Maximum sequence length: 2049, sample length: 3956 [default0]:Skipping sample id=2741593. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2490684. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2720306. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2749582. Maximum sequence length: 2049, sample length: 3043 [default0]:Skipping sample id=2736261. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2737538. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2749315. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2494952. Maximum sequence length: 2049, sample length: 3892 [default0]:Skipping sample id=2733900. Maximum sequence length: 2049, sample length: 4463 [default0]:Skipping sample id=2493013. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2723381. Maximum sequence length: 2049, sample length: 3462 [default0]:Skipping sample id=2729680. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2711918. Maximum sequence length: 2049, sample length: 3881 [default0]:Skipping sample id=2754269. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2711908. Maximum sequence length: 2049, sample length: 4463 [default0]:Skipping sample id=2740459. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2735850. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2752786. Maximum sequence length: 2049, sample length: 2602 [default0]:Skipping sample id=2746066. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2736519. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2739749. Maximum sequence length: 2049, sample length: 4466 [default0]:Skipping sample id=2736105. Maximum sequence length: 2049, sample length: 4726 [default0]:Skipping sample id=2740697. Maximum sequence length: 2049, sample length: 3502 [default0]:Skipping sample id=2734195. Maximum sequence length: 2049, sample length: 3136 [default0]:Skipping sample id=2753056. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2744029. Maximum sequence length: 2049, sample length: 3062 [default0]:Skipping sample id=2725859. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2723096. Maximum sequence length: 2049, sample length: 3491 [default0]:Skipping sample id=2751024. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2467091. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2731719. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2724359. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2717572. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2733028. Maximum sequence length: 2049, sample length: 4377 [default0]:Skipping sample id=2726620. Maximum sequence length: 2049, sample length: 2741 [default0]:Skipping sample id=2718767. Maximum sequence length: 2049, sample length: 2973 [default0]:Skipping sample id=2740153. Maximum sequence length: 2049, sample length: 4288 [default0]:Skipping sample id=2712182. Maximum sequence length: 2049, sample length: 4090 [default0]:Skipping sample id=2751541. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2730280. Maximum sequence length: 2049, sample length: 6616 [default0]:Skipping sample id=2756722. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2734267. Maximum sequence length: 2049, sample length: 5678 [default0]:Skipping sample id=2720562. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2718384. Maximum sequence length: 2049, sample length: 3191 [default0]:Skipping sample id=2730950. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2711405. Maximum sequence length: 2049, sample length: 6073 [default0]:Skipping sample id=2714572. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2742445. Maximum sequence length: 2049, sample length: 2942 [default0]:Skipping sample id=2719640. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2753427. Maximum sequence length: 2049, sample length: 3045 [default0]:Skipping sample id=2753457. Maximum sequence length: 2049, sample length: 3025 [default0]:Skipping sample id=2713032. Maximum sequence length: 2049, sample length: 3531 [default0]:Skipping sample id=2756314. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2732168. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2719348. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2755123. Maximum sequence length: 2049, sample length: 5081 [default0]:Skipping sample id=2716343. Maximum sequence length: 2049, sample length: 2937 [default0]:Skipping sample id=2737746. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2724919. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2724230. Maximum sequence length: 2049, sample length: 3789 [default0]:Skipping sample id=2717920. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2721195. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2729135. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2730687. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2722087. Maximum sequence length: 2049, sample length: 3128 [default0]:Skipping sample id=2483177. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2752693. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2745379. Maximum sequence length: 2049, sample length: 2938 [default0]:Skipping sample id=2731674. Maximum sequence length: 2049, sample length: 4900 [default0]:Skipping sample id=2751368. Maximum sequence length: 2049, sample length: 3232 [default0]:Skipping sample id=2756175. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2756301. Maximum sequence length: 2049, sample length: 3742 [default0]:Skipping sample id=2730284. Maximum sequence length: 2049, sample length: 4145 [default0]:Skipping sample id=2722461. Maximum sequence length: 2049, sample length: 5950 [default0]:Skipping sample id=2753055. Maximum sequence length: 2049, sample length: 3081 [default0]:Skipping sample id=2745037. Maximum sequence length: 2049, sample length: 6556 [default0]:Skipping sample id=2723822. Maximum sequence length: 2049, sample length: 4208 [default0]:Skipping sample id=2724318. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2756358. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2743251. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2750306. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2717929. Maximum sequence length: 2049, sample length: 3187 [default0]:Skipping sample id=2756938. Maximum sequence length: 2049, sample length: 3382 [default0]:Skipping sample id=2489682. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2720840. Maximum sequence length: 2049, sample length: 4211 [default0]:Skipping sample id=2724319. Maximum sequence length: 2049, sample length: 4043 [default0]:Skipping sample id=2730506. Maximum sequence length: 2049, sample length: 5600 [default0]:Skipping sample id=2466699. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2730324. Maximum sequence length: 2049, sample length: 3714 [default0]:Skipping sample id=2726654. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2736258. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2467859. Maximum sequence length: 2049, sample length: 3470 [default0]:Skipping sample id=2712158. Maximum sequence length: 2049, sample length: 3806 [default0]:Skipping sample id=2727204. Maximum sequence length: 2049, sample length: 3899 [default0]:Skipping sample id=2722348. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2744217. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2730220. Maximum sequence length: 2049, sample length: 2468 [default0]:Skipping sample id=2493587. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2720120. Maximum sequence length: 2049, sample length: 3235 [default0]:Skipping sample id=2755640. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2744992. Maximum sequence length: 2049, sample length: 2623 [default0]:Skipping sample id=2753121. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2744676. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2727377. Maximum sequence length: 2049, sample length: 3179 [default0]:Skipping sample id=2466642. Maximum sequence length: 2049, sample length: 2777 [default0]:Skipping sample id=2717393. Maximum sequence length: 2049, sample length: 3949 [default0]:Skipping sample id=2726969. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2742449. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2723880. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2717287. Maximum sequence length: 2049, sample length: 3806 [default0]:Skipping sample id=2751883. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2732160. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2736175. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2737164. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2742001. Maximum sequence length: 2049, sample length: 3738 [default0]:Skipping sample id=2488917. Maximum sequence length: 2049, sample length: 2842 [default0]:Skipping sample id=2484896. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2719099. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2725691. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2746728. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2743902. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2746748. Maximum sequence length: 2049, sample length: 2712 [default0]:Skipping sample id=2746234. Maximum sequence length: 2049, sample length: 4081 [default0]:Skipping sample id=2753338. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2746903. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2755197. Maximum sequence length: 2049, sample length: 5172 [default0]:Skipping sample id=2746948. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2729891. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2722583. Maximum sequence length: 2049, sample length: 3808 [default0]:Skipping sample id=2722837. Maximum sequence length: 2049, sample length: 2490 [default0]:Skipping sample id=2726051. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2738549. Maximum sequence length: 2049, sample length: 3950 [default0]:Skipping sample id=2742591. Maximum sequence length: 2049, sample length: 3745 [default0]:Skipping sample id=2740579. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2723713. Maximum sequence length: 2049, sample length: 5724 [default0]:Skipping sample id=2734265. Maximum sequence length: 2049, sample length: 3581 [default0]:Skipping sample id=2730736. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2752505. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2718664. Maximum sequence length: 2049, sample length: 4168 [default0]:Skipping sample id=2749400. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2750940. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2751565. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2736515. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2745994. Maximum sequence length: 2049, sample length: 5861 [default0]:Skipping sample id=2747121. Maximum sequence length: 2049, sample length: 2829 [default0]:Skipping sample id=2492724. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2739926. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2729215. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2492291. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2751546. Maximum sequence length: 2049, sample length: 3270 [default0]:Skipping sample id=2738242. Maximum sequence length: 2049, sample length: 3154 [default0]:Skipping sample id=2720968. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2720824. Maximum sequence length: 2049, sample length: 3960 [default0]:Skipping sample id=2731136. Maximum sequence length: 2049, sample length: 3386 [default0]:Skipping sample id=2726112. Maximum sequence length: 2049, sample length: 2838 [default0]:Skipping sample id=2743969. Maximum sequence length: 2049, sample length: 4502 [default0]:Skipping sample id=2735104. Maximum sequence length: 2049, sample length: 3332 [default0]:Skipping sample id=2487814. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2737047. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2469876. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2714600. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2738958. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2747845. Maximum sequence length: 2049, sample length: 3122 [default0]:Skipping sample id=2730594. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2729630. Maximum sequence length: 2049, sample length: 5197 [default0]:Skipping sample id=2728595. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2730690. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2727912. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2493241. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2721091. Maximum sequence length: 2049, sample length: 3249 [default0]:Skipping sample id=2740186. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2465792. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2750962. Maximum sequence length: 2049, sample length: 4967 [default0]:Skipping sample id=2754165. Maximum sequence length: 2049, sample length: 2369 [default0]:Skipping sample id=2743131. Maximum sequence length: 2049, sample length: 3119 [default0]:Skipping sample id=2751756. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2713269. Maximum sequence length: 2049, sample length: 3481 [default0]:Skipping sample id=2732187. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2713136. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2739143. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2712562. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2753433. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2754155. Maximum sequence length: 2049, sample length: 3302 [default0]:Skipping sample id=2720697. Maximum sequence length: 2049, sample length: 4174 [default0]:Skipping sample id=2710995. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2752159. Maximum sequence length: 2049, sample length: 3192 [default0]:Skipping sample id=2756366. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2746623. Maximum sequence length: 2049, sample length: 2708 [default0]:Skipping sample id=2734495. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2745035. Maximum sequence length: 2049, sample length: 2972 [default0]:Skipping sample id=2468814. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2729265. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2751846. Maximum sequence length: 2049, sample length: 6246 [default0]:Skipping sample id=2717471. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2478493. Maximum sequence length: 2049, sample length: 3270 [default0]:Skipping sample id=2482064. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2749103. Maximum sequence length: 2049, sample length: 2816 [default0]:Skipping sample id=2713979. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2721261. Maximum sequence length: 2049, sample length: 3810 [default0]:Skipping sample id=2716917. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2716469. Maximum sequence length: 2049, sample length: 6661 [default0]:Skipping sample id=2736435. Maximum sequence length: 2049, sample length: 2939 [default0]:Skipping sample id=2717856. Maximum sequence length: 2049, sample length: 2861 [default0]:Skipping sample id=2742735. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2731921. Maximum sequence length: 2049, sample length: 6063 [default0]:Skipping sample id=2723055. Maximum sequence length: 2049, sample length: 4914 [default0]:Skipping sample id=2745857. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2728893. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2723196. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2752048. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2482604. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2713806. Maximum sequence length: 2049, sample length: 3170 [default0]:Skipping sample id=2732511. Maximum sequence length: 2049, sample length: 3664 [default0]:Skipping sample id=2714697. Maximum sequence length: 2049, sample length: 4570 [default0]:Skipping sample id=2747402. Maximum sequence length: 2049, sample length: 3205 [default0]:Skipping sample id=2753580. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2725050. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2749922. Maximum sequence length: 2049, sample length: 3763 [default0]:Skipping sample id=2755693. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2713718. Maximum sequence length: 2049, sample length: 4613 [default0]:Skipping sample id=2728587. Maximum sequence length: 2049, sample length: 3055 [default0]:Skipping sample id=2496057. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2733199. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2485679. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2750160. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2751968. Maximum sequence length: 2049, sample length: 5873 [default0]:Skipping sample id=2480331. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2744624. Maximum sequence length: 2049, sample length: 2958 [default0]:Skipping sample id=2719290. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2728600. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2740914. Maximum sequence length: 2049, sample length: 3736 [default0]:Skipping sample id=2728193. Maximum sequence length: 2049, sample length: 4112 [default0]:Skipping sample id=2479644. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2754760. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2730920. Maximum sequence length: 2049, sample length: 5136 [default0]:Skipping sample id=2747665. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2733506. Maximum sequence length: 2049, sample length: 3028 [default0]:Skipping sample id=2732150. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2732424. Maximum sequence length: 2049, sample length: 4453 [default0]:Skipping sample id=2728968. Maximum sequence length: 2049, sample length: 4232 [default0]:Skipping sample id=2734404. Maximum sequence length: 2049, sample length: 2567 [default0]:Skipping sample id=2483564. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2728154. Maximum sequence length: 2049, sample length: 4148 [default0]:Skipping sample id=2723756. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2719615. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2737675. Maximum sequence length: 2049, sample length: 14223 [default0]:Skipping sample id=2743558. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2488482. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2487848. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2715440. Maximum sequence length: 2049, sample length: 3752 [default0]:Skipping sample id=2488365. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2494573. Maximum sequence length: 2049, sample length: 2824 [default0]:Skipping sample id=2724881. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2753430. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2494343. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2490918. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2731569. Maximum sequence length: 2049, sample length: 3768 [default0]:Skipping sample id=2716792. Maximum sequence length: 2049, sample length: 4084 [default0]:Skipping sample id=2740536. Maximum sequence length: 2049, sample length: 4573 [default0]:Skipping sample id=2733775. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2485024. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2491146. Maximum sequence length: 2049, sample length: 2756 [default0]:Skipping sample id=2723249. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2749362. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2719782. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2737121. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2735565. Maximum sequence length: 2049, sample length: 3825 [default0]:Skipping sample id=2726000. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2722532. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2735255. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2719746. Maximum sequence length: 2049, sample length: 6614 [default0]:Skipping sample id=2468007. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2734053. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2749984. Maximum sequence length: 2049, sample length: 3147 [default0]:Skipping sample id=2723635. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2721715. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2728069. Maximum sequence length: 2049, sample length: 2585 [default0]:Skipping sample id=2745705. Maximum sequence length: 2049, sample length: 3261 [default0]:Skipping sample id=2743468. Maximum sequence length: 2049, sample length: 4073 [default0]:Skipping sample id=2726430. Maximum sequence length: 2049, sample length: 4235 [default0]:Skipping sample id=2723790. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2756235. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2747051. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2741054. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2728463. Maximum sequence length: 2049, sample length: 4858 [default0]:Skipping sample id=2754991. Maximum sequence length: 2049, sample length: 3052 [default0]:Skipping sample id=2745671. Maximum sequence length: 2049, sample length: 3916 [default0]:Skipping sample id=2732005. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2729499. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2714043. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2718444. Maximum sequence length: 2049, sample length: 3164 [default0]:Skipping sample id=2484452. Maximum sequence length: 2049, sample length: 2888 [default0]:Skipping sample id=2756441. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2730764. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2719577. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2721507. Maximum sequence length: 2049, sample length: 5208 [default0]:Skipping sample id=2746836. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2726550. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2724269. Maximum sequence length: 2049, sample length: 4377 [default0]:Skipping sample id=2748660. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2714645. Maximum sequence length: 2049, sample length: 2623 [default0]:Skipping sample id=2716545. Maximum sequence length: 2049, sample length: 3785 [default0]:Skipping sample id=2738151. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2735689. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2724735. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2746070. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2734930. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2752102. Maximum sequence length: 2049, sample length: 2507 [default0]:Skipping sample id=2718577. Maximum sequence length: 2049, sample length: 3369 [default0]:Skipping sample id=2727333. Maximum sequence length: 2049, sample length: 4834 [default0]:Skipping sample id=2720766. Maximum sequence length: 2049, sample length: 3173 [default0]:Skipping sample id=2714943. Maximum sequence length: 2049, sample length: 3508 [default0]:Skipping sample id=2733371. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2712800. Maximum sequence length: 2049, sample length: 3187 [default0]:Skipping sample id=2734597. Maximum sequence length: 2049, sample length: 3249 [default0]:Skipping sample id=2722618. Maximum sequence length: 2049, sample length: 4984 [default0]:Skipping sample id=2470605. Maximum sequence length: 2049, sample length: 3039 [default0]:Skipping sample id=2736878. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2755637. Maximum sequence length: 2049, sample length: 2944 [default0]:Skipping sample id=2735456. Maximum sequence length: 2049, sample length: 4525 [default0]:Skipping sample id=2751273. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2718402. Maximum sequence length: 2049, sample length: 4095 [default0]:Skipping sample id=2483329. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2725198. Maximum sequence length: 2049, sample length: 4542 [default0]:Skipping sample id=2729827. Maximum sequence length: 2049, sample length: 6853 [default0]:Skipping sample id=2490926. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2754078. Maximum sequence length: 2049, sample length: 3951 [default0]:Skipping sample id=2725821. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2716800. Maximum sequence length: 2049, sample length: 3113 [default0]:Skipping sample id=2732607. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2711215. Maximum sequence length: 2049, sample length: 7329 [default0]:Skipping sample id=2753074. Maximum sequence length: 2049, sample length: 3696 [default0]:Skipping sample id=2722944. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2737843. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2742000. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2714362. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2744902. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2741239. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2717141. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2753458. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2492868. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2747336. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2735561. Maximum sequence length: 2049, sample length: 4579 [default0]:Skipping sample id=2723428. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2731453. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2735356. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2745750. Maximum sequence length: 2049, sample length: 4467 [default0]:Skipping sample id=2716505. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2735140. Maximum sequence length: 2049, sample length: 4314 [default0]:Skipping sample id=2468858. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2715904. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2711059. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2481983. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2470650. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2719670. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2711084. Maximum sequence length: 2049, sample length: 4069 [default0]:Skipping sample id=2725019. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2746808. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2727584. Maximum sequence length: 2049, sample length: 4145 [default0]:Skipping sample id=2719597. Maximum sequence length: 2049, sample length: 3651 [default0]:Skipping sample id=2711618. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2711546. Maximum sequence length: 2049, sample length: 3597 [default0]:Skipping sample id=2744477. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2750173. Maximum sequence length: 2049, sample length: 2507 [default0]:Skipping sample id=2749418. Maximum sequence length: 2049, sample length: 5110 [default0]:Skipping sample id=2753002. Maximum sequence length: 2049, sample length: 3923 [default0]:Skipping sample id=2747924. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2495080. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2726473. Maximum sequence length: 2049, sample length: 3568 [default0]:Skipping sample id=2755385. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2714361. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2733165. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2725767. Maximum sequence length: 2049, sample length: 5170 [default0]:Skipping sample id=2729447. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2723673. Maximum sequence length: 2049, sample length: 3242 [default0]:Skipping sample id=2484760. Maximum sequence length: 2049, sample length: 2919 [default0]:Skipping sample id=2746304. Maximum sequence length: 2049, sample length: 3232 [default0]:Skipping sample id=2752303. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2734640. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2754029. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2740328. Maximum sequence length: 2049, sample length: 4013 [default0]:Skipping sample id=2489118. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2727005. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2741184. Maximum sequence length: 2049, sample length: 3641 [default0]:Skipping sample id=2732842. Maximum sequence length: 2049, sample length: 3193 [default0]:Skipping sample id=2481288. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2739798. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2728528. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2726286. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2721237. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2718273. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2744008. Maximum sequence length: 2049, sample length: 4789 [default0]:Skipping sample id=2739604. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2727618. Maximum sequence length: 2049, sample length: 5507 [default0]:Skipping sample id=2721612. Maximum sequence length: 2049, sample length: 3805 [default0]:Skipping sample id=2723394. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2742273. Maximum sequence length: 2049, sample length: 3197 [default0]:Skipping sample id=2757036. Maximum sequence length: 2049, sample length: 4076 [default0]:Skipping sample id=2730718. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2747070. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2467291. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2757007. Maximum sequence length: 2049, sample length: 2593 [default0]:Skipping sample id=2750374. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2489778. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2466843. Maximum sequence length: 2049, sample length: 3894 [default0]:Skipping sample id=2744905. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2737375. Maximum sequence length: 2049, sample length: 5078 [default0]:Skipping sample id=2744148. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2739378. Maximum sequence length: 2049, sample length: 3330 [default0]:Skipping sample id=2488606. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2729091. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2483452. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2732304. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2734759. Maximum sequence length: 2049, sample length: 5197 [default0]:Skipping sample id=2715381. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2714496. Maximum sequence length: 2049, sample length: 3378 [default0]:Skipping sample id=2742101. Maximum sequence length: 2049, sample length: 2455 [default0]:Skipping sample id=2733614. Maximum sequence length: 2049, sample length: 3044 [default0]:Skipping sample id=2713654. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2745536. Maximum sequence length: 2049, sample length: 2796 [default0]:Skipping sample id=2486624. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2751824. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2486171. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2756996. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2725015. Maximum sequence length: 2049, sample length: 3836 [default0]:Skipping sample id=2743204. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2754064. Maximum sequence length: 2049, sample length: 3436 [default0]:Skipping sample id=2751466. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2753770. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2742254. Maximum sequence length: 2049, sample length: 4078 [default0]:Skipping sample id=2742781. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2754081. Maximum sequence length: 2049, sample length: 3872 [default0]:Skipping sample id=2737443. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2733154. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2719067. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2498703. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2738150. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2732596. Maximum sequence length: 2049, sample length: 3228 [default0]:Skipping sample id=2749749. Maximum sequence length: 2049, sample length: 2945 [default0]:Skipping sample id=2716172. Maximum sequence length: 2049, sample length: 4245 [default0]:Skipping sample id=2735288. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2713682. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2755075. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2711892. Maximum sequence length: 2049, sample length: 3099 [default0]:Skipping sample id=2732429. Maximum sequence length: 2049, sample length: 3015 [default0]:Skipping sample id=2714539. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2738820. Maximum sequence length: 2049, sample length: 5535 [default0]:Skipping sample id=2723686. Maximum sequence length: 2049, sample length: 3506 [default0]:Skipping sample id=2494230. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2724627. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2748582. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2750332. Maximum sequence length: 2049, sample length: 6247 [default0]:Skipping sample id=2732345. Maximum sequence length: 2049, sample length: 3110 [default0]:Skipping sample id=2730274. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2754706. Maximum sequence length: 2049, sample length: 6635 [default0]:Skipping sample id=2751963. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2752250. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2722151. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2712179. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2744403. Maximum sequence length: 2049, sample length: 4375 [default0]:Skipping sample id=2743618. Maximum sequence length: 2049, sample length: 2881 [default0]:Skipping sample id=2499119. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2722033. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2721035. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2711658. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2729194. Maximum sequence length: 2049, sample length: 4284 [default0]:Skipping sample id=2752357. Maximum sequence length: 2049, sample length: 4169 [default0]:Skipping sample id=2714101. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2490760. Maximum sequence length: 2049, sample length: 2580 [default0]:Skipping sample id=2747212. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2749516. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2483383. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2735361. Maximum sequence length: 2049, sample length: 3812 [default0]:Skipping sample id=2739142. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2711855. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2738827. Maximum sequence length: 2049, sample length: 3838 [default0]:Skipping sample id=2756936. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2718742. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2743193. Maximum sequence length: 2049, sample length: 3558 [default0]:Skipping sample id=2726179. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2738759. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2745951. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2733507. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2469119. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2753133. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2498488. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2717353. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2729405. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2722578. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2723003. Maximum sequence length: 2049, sample length: 3413 [default0]:Skipping sample id=2716215. Maximum sequence length: 2049, sample length: 3578 [default0]:Skipping sample id=2716506. Maximum sequence length: 2049, sample length: 3491 [default0]:Skipping sample id=2755993. Maximum sequence length: 2049, sample length: 4598 [default0]:Skipping sample id=2478946. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2718414. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2746317. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2739396. Maximum sequence length: 2049, sample length: 5552 [default0]:Skipping sample id=2726821. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2731098. Maximum sequence length: 2049, sample length: 4201 [default0]:Skipping sample id=2749450. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2744461. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2756568. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2746692. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2755393. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2756461. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2751539. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2716181. Maximum sequence length: 2049, sample length: 3089 [default0]:Skipping sample id=2721250. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2712357. Maximum sequence length: 2049, sample length: 3236 [default0]:Skipping sample id=2716228. Maximum sequence length: 2049, sample length: 3039 [default0]:Skipping sample id=2747140. Maximum sequence length: 2049, sample length: 3160 [default0]:Skipping sample id=2720174. Maximum sequence length: 2049, sample length: 4589 [default0]:Skipping sample id=2717479. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2743088. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2715361. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2746858. Maximum sequence length: 2049, sample length: 2875 [default0]:Skipping sample id=2719544. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2715577. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2467429. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2477016. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2730428. Maximum sequence length: 2049, sample length: 5221 [default0]:Skipping sample id=2738194. Maximum sequence length: 2049, sample length: 4598 [default0]:Skipping sample id=2744777. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2712835. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2738483. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2749097. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2727390. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2737458. Maximum sequence length: 2049, sample length: 7102 [default0]:Skipping sample id=2742391. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2482773. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2756968. Maximum sequence length: 2049, sample length: 4320 [default0]:Skipping sample id=2717918. Maximum sequence length: 2049, sample length: 4085 [default0]:Skipping sample id=2744572. Maximum sequence length: 2049, sample length: 2991 [default0]:Skipping sample id=2741540. Maximum sequence length: 2049, sample length: 3101 [default0]:Skipping sample id=2494252. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2717304. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2751383. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2756420. Maximum sequence length: 2049, sample length: 6404 [default0]:Skipping sample id=2747222. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2749686. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2737644. Maximum sequence length: 2049, sample length: 3387 [default0]:Skipping sample id=2495200. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2714574. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2751584. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2753207. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2711805. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2747675. Maximum sequence length: 2049, sample length: 3754 [default0]:Skipping sample id=2727110. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2740162. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2742563. Maximum sequence length: 2049, sample length: 3496 [default0]:Skipping sample id=2738440. Maximum sequence length: 2049, sample length: 2602 [default0]:Skipping sample id=2734190. Maximum sequence length: 2049, sample length: 6800 [default0]:Skipping sample id=2734183. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2721637. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2719516. Maximum sequence length: 2049, sample length: 2903 [default0]:Skipping sample id=2711519. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2738800. Maximum sequence length: 2049, sample length: 6967 [default0]:Skipping sample id=2724139. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2493261. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2734576. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2712095. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2744948. Maximum sequence length: 2049, sample length: 3427 [default0]:Skipping sample id=2732675. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2718624. Maximum sequence length: 2049, sample length: 3759 [default0]:Skipping sample id=2734945. Maximum sequence length: 2049, sample length: 6455 [default0]:Skipping sample id=2735081. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2490064. Maximum sequence length: 2049, sample length: 3509 [default0]:Skipping sample id=2739088. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2495030. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2725896. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2740999. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2711524. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2723415. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2731323. Maximum sequence length: 2049, sample length: 4552 [default0]:Skipping sample id=2723915. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2470506. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2740564. Maximum sequence length: 2049, sample length: 3888 [default0]:Skipping sample id=2750788. Maximum sequence length: 2049, sample length: 3042 [default0]:Skipping sample id=2753494. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2495463. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2723793. Maximum sequence length: 2049, sample length: 4175 [default0]:Skipping sample id=2747681. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2745300. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2730631. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2752508. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2740156. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2747849. Maximum sequence length: 2049, sample length: 4696 [default0]:Skipping sample id=2722418. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2470795. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2728416. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2735948. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2470693. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2755969. Maximum sequence length: 2049, sample length: 3469 [default0]:Skipping sample id=2737093. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2731783. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2750837. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2753838. Maximum sequence length: 2049, sample length: 4023 [default0]:Skipping sample id=2741442. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2729291. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2733893. Maximum sequence length: 2049, sample length: 3990 [default0]:Skipping sample id=2717146. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2731467. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2716485. Maximum sequence length: 2049, sample length: 2861 [default0]:Skipping sample id=2733298. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2711789. Maximum sequence length: 2049, sample length: 3635 [default0]:Skipping sample id=2729642. Maximum sequence length: 2049, sample length: 4291 [default0]:Skipping sample id=2747451. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2717750. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2736793. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2729119. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2735435. Maximum sequence length: 2049, sample length: 3978 [default0]:Skipping sample id=2731514. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2717552. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2723830. Maximum sequence length: 2049, sample length: 3414 [default0]:Skipping sample id=2724009. Maximum sequence length: 2049, sample length: 5695 [default0]:Skipping sample id=2711512. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2715632. Maximum sequence length: 2049, sample length: 2993 [default0]:Skipping sample id=2754846. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2723339. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2738530. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2729488. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2469202. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2727213. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2721343. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2724146. Maximum sequence length: 2049, sample length: 2584 [default0]:Skipping sample id=2744712. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2755453. Maximum sequence length: 2049, sample length: 3368 [default0]:Skipping sample id=2724180. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2721933. Maximum sequence length: 2049, sample length: 2649 [default0]:Skipping sample id=2728828. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2725915. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2496510. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2716378. Maximum sequence length: 2049, sample length: 5201 [default0]:Skipping sample id=2732141. Maximum sequence length: 2049, sample length: 6050 [default0]:Skipping sample id=2749635. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2743982. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2756588. Maximum sequence length: 2049, sample length: 3347 [default0]:Skipping sample id=2737796. Maximum sequence length: 2049, sample length: 4952 [default0]:Skipping sample id=2755461. Maximum sequence length: 2049, sample length: 6245 [default0]:Skipping sample id=2727052. Maximum sequence length: 2049, sample length: 2944 [default0]:Skipping sample id=2730234. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2471154. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2731359. Maximum sequence length: 2049, sample length: 2753 [default0]:Skipping sample id=2746214. Maximum sequence length: 2049, sample length: 3678 [default0]:Skipping sample id=2751577. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2494031. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2736218. Maximum sequence length: 2049, sample length: 3549 [default0]:Skipping sample id=2737095. Maximum sequence length: 2049, sample length: 3018 [default0]:Skipping sample id=2749729. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2750355. Maximum sequence length: 2049, sample length: 6425 [default0]:Skipping sample id=2756146. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2498864. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2479321. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2746184. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2731110. Maximum sequence length: 2049, sample length: 8151 [default0]:Skipping sample id=2468881. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2729550. Maximum sequence length: 2049, sample length: 2891 [default0]:Skipping sample id=2724630. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2736286. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2752434. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2469340. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2727673. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2494373. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2737143. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2748270. Maximum sequence length: 2049, sample length: 2905 [default0]:Skipping sample id=2466513. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2722200. Maximum sequence length: 2049, sample length: 3193 [default0]:Skipping sample id=2734322. Maximum sequence length: 2049, sample length: 3160 [default0]:Skipping sample id=2740551. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2739853. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2747895. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2712571. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2711592. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2749596. Maximum sequence length: 2049, sample length: 4806 [default0]:Skipping sample id=2735170. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2743601. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2716177. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2740168. Maximum sequence length: 2049, sample length: 2709 [default0]:Skipping sample id=2471289. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2716678. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2750983. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2727260. Maximum sequence length: 2049, sample length: 3010 [default0]:Skipping sample id=2498051. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2726160. Maximum sequence length: 2049, sample length: 3590 [default0]:Skipping sample id=2483099. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2749738. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2484432. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2734046. Maximum sequence length: 2049, sample length: 3068 [default0]:Skipping sample id=2739864. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2720788. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2724560. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2722655. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2484481. Maximum sequence length: 2049, sample length: 3307 [default0]:Skipping sample id=2736464. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2727470. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2733679. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2743208. Maximum sequence length: 2049, sample length: 3231 [default0]:Skipping sample id=2721591. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2745047. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2735593. Maximum sequence length: 2049, sample length: 7775 [default0]:Skipping sample id=2723881. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2490646. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2748127. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2746139. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2466612. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2728148. Maximum sequence length: 2049, sample length: 3687 [default0]:Skipping sample id=2733862. Maximum sequence length: 2049, sample length: 4351 [default0]:Skipping sample id=2714887. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2477251. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2740747. Maximum sequence length: 2049, sample length: 4123 [default0]:Skipping sample id=2712034. Maximum sequence length: 2049, sample length: 7271 [default0]:Skipping sample id=2488226. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2733466. Maximum sequence length: 2049, sample length: 5429 [default0]:Skipping sample id=2716701. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2719108. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2751081. Maximum sequence length: 2049, sample length: 5758 [default0]:Skipping sample id=2489685. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2735695. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2732249. Maximum sequence length: 2049, sample length: 4500 [default0]:Skipping sample id=2712314. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2739915. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2756110. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2720893. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2499377. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2730994. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2492318. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2481220. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2749358. Maximum sequence length: 2049, sample length: 5536 [default0]:Skipping sample id=2492363. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2477766. Maximum sequence length: 2049, sample length: 2829 [default0]:Skipping sample id=2737611. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2753738. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2728270. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2731980. Maximum sequence length: 2049, sample length: 2886 [default0]:Skipping sample id=2754221. Maximum sequence length: 2049, sample length: 6216 [default0]:Skipping sample id=2738368. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2723222. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2739037. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2718412. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2717984. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2496990. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2711560. Maximum sequence length: 2049, sample length: 4967 [default0]:Skipping sample id=2720293. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2755804. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2755597. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2729507. Maximum sequence length: 2049, sample length: 3325 [default0]:Skipping sample id=2748224. Maximum sequence length: 2049, sample length: 6264 [default0]:Skipping sample id=2711047. Maximum sequence length: 2049, sample length: 2849 [default0]:Skipping sample id=2732714. Maximum sequence length: 2049, sample length: 2924 [default0]:Skipping sample id=2713824. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2716301. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2483667. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2711787. Maximum sequence length: 2049, sample length: 6934 [default0]:Skipping sample id=2711861. Maximum sequence length: 2049, sample length: 3160 [default0]:Skipping sample id=2726894. Maximum sequence length: 2049, sample length: 4681 [default0]:Skipping sample id=2747319. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2728524. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2718653. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2744366. Maximum sequence length: 2049, sample length: 3145 [default0]:Skipping sample id=2735206. Maximum sequence length: 2049, sample length: 3858 [default0]:Skipping sample id=2713652. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2726610. Maximum sequence length: 2049, sample length: 2746 [default0]:Skipping sample id=2756609. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2727899. Maximum sequence length: 2049, sample length: 4376 [default0]:Skipping sample id=2722660. Maximum sequence length: 2049, sample length: 3067 [default0]:Skipping sample id=2735341. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2719648. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2748125. Maximum sequence length: 2049, sample length: 3115 [default0]:Skipping sample id=2716772. Maximum sequence length: 2049, sample length: 4168 [default0]:Skipping sample id=2733052. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2720141. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2749875. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2736107. Maximum sequence length: 2049, sample length: 4320 [default0]:Skipping sample id=2489360. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2730079. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2729107. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2750807. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2750088. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2498385. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2498943. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2750947. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2749739. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2480057. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2486693. Maximum sequence length: 2049, sample length: 4277 [default0]:Skipping sample id=2720419. Maximum sequence length: 2049, sample length: 2749 [default0]:Skipping sample id=2719077. Maximum sequence length: 2049, sample length: 3868 [default0]:Skipping sample id=2478164. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2484707. Maximum sequence length: 2049, sample length: 2835 [default0]:Skipping sample id=2737062. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2741870. Maximum sequence length: 2049, sample length: 3253 [default0]:Skipping sample id=2751645. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2494484. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2713021. Maximum sequence length: 2049, sample length: 6409 [default0]:Skipping sample id=2724832. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2752489. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2715360. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2735685. Maximum sequence length: 2049, sample length: 3611 [default0]:Skipping sample id=2750010. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2749688. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2752096. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2484157. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2752512. Maximum sequence length: 2049, sample length: 4083 [default0]:Skipping sample id=2739735. Maximum sequence length: 2049, sample length: 3374 [default0]:Skipping sample id=2711638. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2738974. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2481381. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2740106. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2754942. Maximum sequence length: 2049, sample length: 4005 [default0]:Skipping sample id=2715220. Maximum sequence length: 2049, sample length: 5964 [default0]:Skipping sample id=2481722. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2732966. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2726463. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2719665. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2743162. Maximum sequence length: 2049, sample length: 3939 [default0]:Skipping sample id=2753799. Maximum sequence length: 2049, sample length: 3966 [default0]:Skipping sample id=2751110. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2738313. Maximum sequence length: 2049, sample length: 3630 [default0]:Skipping sample id=2717394. Maximum sequence length: 2049, sample length: 3582 [default0]:Skipping sample id=2746680. Maximum sequence length: 2049, sample length: 3377 [default0]:Skipping sample id=2743674. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2725660. Maximum sequence length: 2049, sample length: 3826 [default0]:Skipping sample id=2713735. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2731591. Maximum sequence length: 2049, sample length: 4772 [default0]:Skipping sample id=2494104. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2749417. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2744927. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2743659. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2479566. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2732065. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2495951. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2736549. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2731215. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2467562. Maximum sequence length: 2049, sample length: 3237 [default0]:Skipping sample id=2756389. Maximum sequence length: 2049, sample length: 3208 [default0]:Skipping sample id=2715410. Maximum sequence length: 2049, sample length: 2968 [default0]:Skipping sample id=2748835. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2498565. Maximum sequence length: 2049, sample length: 2761 [default0]:Skipping sample id=2752397. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2748897. Maximum sequence length: 2049, sample length: 4594 [default0]:Skipping sample id=2714314. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2715791. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2756701. Maximum sequence length: 2049, sample length: 3145 [default0]:Skipping sample id=2748764. Maximum sequence length: 2049, sample length: 3026 [default0]:Skipping sample id=2749408. Maximum sequence length: 2049, sample length: 3149 [default0]:Skipping sample id=2717855. Maximum sequence length: 2049, sample length: 6228 [default0]:Skipping sample id=2755740. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2744509. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2755325. Maximum sequence length: 2049, sample length: 2816 [default0]:Skipping sample id=2742240. Maximum sequence length: 2049, sample length: 5523 [default0]:Skipping sample id=2748128. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2739574. Maximum sequence length: 2049, sample length: 2994 [default0]:Skipping sample id=2745981. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2745876. Maximum sequence length: 2049, sample length: 4124 [default0]:Skipping sample id=2743094. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2715428. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2716483. Maximum sequence length: 2049, sample length: 3798 [default0]:Skipping sample id=2731382. Maximum sequence length: 2049, sample length: 3263 [default0]:Skipping sample id=2736731. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2488799. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2751016. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2713161. Maximum sequence length: 2049, sample length: 3678 [default0]:Skipping sample id=2715135. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2725326. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2751392. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2738573. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2710994. Maximum sequence length: 2049, sample length: 3311 [default0]:Skipping sample id=2743844. Maximum sequence length: 2049, sample length: 3120 [default0]:Skipping sample id=2751996. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2714931. Maximum sequence length: 2049, sample length: 3436 [default0]:Skipping sample id=2712073. Maximum sequence length: 2049, sample length: 3209 [default0]:Skipping sample id=2729754. Maximum sequence length: 2049, sample length: 2704 [default0]:Skipping sample id=2737079. Maximum sequence length: 2049, sample length: 2640 [default0]:Skipping sample id=2742635. Maximum sequence length: 2049, sample length: 3024 [default0]:Skipping sample id=2748326. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2750497. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2484546. Maximum sequence length: 2049, sample length: 3411 [default0]:Skipping sample id=2746018. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2736124. Maximum sequence length: 2049, sample length: 3550 [default0]:Skipping sample id=2724823. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2497975. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2740537. Maximum sequence length: 2049, sample length: 3504 [default0]:Skipping sample id=2754038. Maximum sequence length: 2049, sample length: 4914 [default0]:Skipping sample id=2739558. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2497383. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2721111. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2725764. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2753085. Maximum sequence length: 2049, sample length: 3028 [default0]:Skipping sample id=2493998. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2713170. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2741937. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2752525. Maximum sequence length: 2049, sample length: 4131 [default0]:Skipping sample id=2735935. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2721518. Maximum sequence length: 2049, sample length: 2947 [default0]:Skipping sample id=2726026. Maximum sequence length: 2049, sample length: 2555 [default0]:Skipping sample id=2712581. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2725056. Maximum sequence length: 2049, sample length: 5242 [default0]:Skipping sample id=2711148. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2750841. Maximum sequence length: 2049, sample length: 2678 [default0]:Skipping sample id=2482837. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2713523. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2721322. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2748949. Maximum sequence length: 2049, sample length: 3953 [default0]:Skipping sample id=2740858. Maximum sequence length: 2049, sample length: 3570 [default0]:Skipping sample id=2741780. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2727607. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2731431. Maximum sequence length: 2049, sample length: 3563 [default0]:Skipping sample id=2467951. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2729338. Maximum sequence length: 2049, sample length: 4579 [default0]:Skipping sample id=2715035. Maximum sequence length: 2049, sample length: 3239 [default0]:Skipping sample id=2719976. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2738787. Maximum sequence length: 2049, sample length: 3292 [default0]:Skipping sample id=2737848. Maximum sequence length: 2049, sample length: 5801 [default0]:Skipping sample id=2738556. Maximum sequence length: 2049, sample length: 5615 [default0]:Skipping sample id=2485266. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2743945. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2721182. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2731407. Maximum sequence length: 2049, sample length: 4372 [default0]:Skipping sample id=2734110. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2736267. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2741722. Maximum sequence length: 2049, sample length: 2565 [default0]:Skipping sample id=2726278. Maximum sequence length: 2049, sample length: 4389 [default0]:Skipping sample id=2733335. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2491184. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2757116. Maximum sequence length: 2049, sample length: 2881 [default0]:Skipping sample id=2746754. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2753308. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2742114. Maximum sequence length: 2049, sample length: 6489 [default0]:Skipping sample id=2488409. Maximum sequence length: 2049, sample length: 2515 [default0]:Skipping sample id=2728752. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2723701. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2716448. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2732713. Maximum sequence length: 2049, sample length: 4145 [default0]:Skipping sample id=2725751. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2716199. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2722469. Maximum sequence length: 2049, sample length: 5353 [default0]:Skipping sample id=2482253. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2486350. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2756783. Maximum sequence length: 2049, sample length: 3795 [default0]:Skipping sample id=2717819. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2489704. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2711772. Maximum sequence length: 2049, sample length: 3844 [default0]:Skipping sample id=2736190. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2727440. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2719107. Maximum sequence length: 2049, sample length: 5423 [default0]:Skipping sample id=2738788. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2722022. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2733156. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2729398. Maximum sequence length: 2049, sample length: 4360 [default0]:Skipping sample id=2749443. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2731534. Maximum sequence length: 2049, sample length: 5003 [default0]:Skipping sample id=2717022. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2725509. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2729049. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2492283. Maximum sequence length: 2049, sample length: 3108 [default0]:Skipping sample id=2749807. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2728550. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2753684. Maximum sequence length: 2049, sample length: 4663 [default0]:Skipping sample id=2483263. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2747546. Maximum sequence length: 2049, sample length: 3976 [default0]:Skipping sample id=2719628. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2734724. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2489969. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2714585. Maximum sequence length: 2049, sample length: 2999 [default0]:Skipping sample id=2741234. Maximum sequence length: 2049, sample length: 4524 [default0]:Skipping sample id=2738948. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2739147. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2752282. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2742622. Maximum sequence length: 2049, sample length: 4221 [default0]:Skipping sample id=2744507. Maximum sequence length: 2049, sample length: 3330 [default0]:Skipping sample id=2711167. Maximum sequence length: 2049, sample length: 2406 [default0]:Skipping sample id=2721484. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2719625. Maximum sequence length: 2049, sample length: 3378 [default0]:Skipping sample id=2756706. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2716407. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2746313. Maximum sequence length: 2049, sample length: 3642 [default0]:Skipping sample id=2721678. Maximum sequence length: 2049, sample length: 3423 [default0]:Skipping sample id=2716633. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2715563. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2719311. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2718121. Maximum sequence length: 2049, sample length: 5381 [default0]:Skipping sample id=2729977. Maximum sequence length: 2049, sample length: 3440 [default0]:Skipping sample id=2726935. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2492245. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2746411. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2715541. Maximum sequence length: 2049, sample length: 3318 [default0]:Skipping sample id=2748808. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2734280. Maximum sequence length: 2049, sample length: 5756 [default0]:Skipping sample id=2712384. Maximum sequence length: 2049, sample length: 3765 [default0]:Skipping sample id=2741622. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2481047. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2733300. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2740889. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2734087. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2751412. Maximum sequence length: 2049, sample length: 5054 [default0]:Skipping sample id=2720978. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2724066. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2715987. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2711064. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2715387. Maximum sequence length: 2049, sample length: 3448 [default0]:Skipping sample id=2738046. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2743517. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2493282. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2725134. Maximum sequence length: 2049, sample length: 4329 [default0]:Skipping sample id=2744499. Maximum sequence length: 2049, sample length: 3639 [default0]:Skipping sample id=2752179. Maximum sequence length: 2049, sample length: 3833 [default0]:Skipping sample id=2745887. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2724438. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2719809. Maximum sequence length: 2049, sample length: 4005 [default0]:Skipping sample id=2717391. Maximum sequence length: 2049, sample length: 4261 [default0]:Skipping sample id=2726186. Maximum sequence length: 2049, sample length: 4574 [default0]:Skipping sample id=2737191. Maximum sequence length: 2049, sample length: 3364 [default0]:Skipping sample id=2745762. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2483651. Maximum sequence length: 2049, sample length: 4093 [default0]:Skipping sample id=2711043. Maximum sequence length: 2049, sample length: 4245 [default0]:Skipping sample id=2723153. Maximum sequence length: 2049, sample length: 3067 [default0]:Skipping sample id=2488133. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2716078. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2741033. Maximum sequence length: 2049, sample length: 6399 [default0]:Skipping sample id=2736460. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2755506. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2730332. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2751660. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2752714. Maximum sequence length: 2049, sample length: 3470 [default0]:Skipping sample id=2733080. Maximum sequence length: 2049, sample length: 3164 [default0]:Skipping sample id=2711798. Maximum sequence length: 2049, sample length: 2836 [default0]:Skipping sample id=2743519. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2732404. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2491349. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2731734. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2733283. Maximum sequence length: 2049, sample length: 4148 [default0]:Skipping sample id=2725937. Maximum sequence length: 2049, sample length: 3063 [default0]:Skipping sample id=2483323. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2748223. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2724694. Maximum sequence length: 2049, sample length: 4726 [default0]:Skipping sample id=2494242. Maximum sequence length: 2049, sample length: 3164 [default0]:Skipping sample id=2722685. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2748569. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2496788. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2716862. Maximum sequence length: 2049, sample length: 4510 [default0]:Skipping sample id=2492950. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2716790. Maximum sequence length: 2049, sample length: 3278 [default0]:Skipping sample id=2744652. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2739967. Maximum sequence length: 2049, sample length: 3011 [default0]:Skipping sample id=2746104. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2731771. Maximum sequence length: 2049, sample length: 5076 [default0]:Skipping sample id=2718323. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2750525. Maximum sequence length: 2049, sample length: 4404 [default0]:Skipping sample id=2755468. Maximum sequence length: 2049, sample length: 4560 [default0]:Skipping sample id=2743510. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2711775. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2731944. Maximum sequence length: 2049, sample length: 3279 [default0]:Skipping sample id=2725558. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2748541. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2713837. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2712767. Maximum sequence length: 2049, sample length: 4158 [default0]:Skipping sample id=2491909. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2739783. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2753917. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2745513. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2717872. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2728430. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2718659. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2743038. Maximum sequence length: 2049, sample length: 3680 [default0]:Skipping sample id=2743158. Maximum sequence length: 2049, sample length: 3634 [default0]:Skipping sample id=2723170. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2731996. Maximum sequence length: 2049, sample length: 4249 [default0]:Skipping sample id=2731351. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2470034. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2754175. Maximum sequence length: 2049, sample length: 2781 [default0]:Skipping sample id=2742546. Maximum sequence length: 2049, sample length: 3747 [default0]:Skipping sample id=2730710. Maximum sequence length: 2049, sample length: 5665 [default0]:Skipping sample id=2487558. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2755170. Maximum sequence length: 2049, sample length: 2585 [default0]:Skipping sample id=2477115. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2754010. Maximum sequence length: 2049, sample length: 3808 [default0]:Skipping sample id=2713006. Maximum sequence length: 2049, sample length: 5042 [default0]:Skipping sample id=2750353. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2478500. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2730607. Maximum sequence length: 2049, sample length: 5954 [default0]:Skipping sample id=2740514. Maximum sequence length: 2049, sample length: 3477 [default0]:Skipping sample id=2729861. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2756378. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2750399. Maximum sequence length: 2049, sample length: 3275 [default0]:Skipping sample id=2726387. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2497868. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2739562. Maximum sequence length: 2049, sample length: 3892 [default0]:Skipping sample id=2721342. Maximum sequence length: 2049, sample length: 4076 [default0]:Skipping sample id=2727951. Maximum sequence length: 2049, sample length: 3164 [default0]:Skipping sample id=2731594. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2729466. Maximum sequence length: 2049, sample length: 3654 [default0]:Skipping sample id=2727826. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2492523. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2723782. Maximum sequence length: 2049, sample length: 2947 [default0]:Skipping sample id=2488653. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2720531. Maximum sequence length: 2049, sample length: 3819 [default0]:Skipping sample id=2713779. Maximum sequence length: 2049, sample length: 5121 [default0]:Skipping sample id=2719132. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2733667. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2749785. Maximum sequence length: 2049, sample length: 4566 [default0]:Skipping sample id=2486219. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2722468. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2741607. Maximum sequence length: 2049, sample length: 3156 [default0]:Skipping sample id=2712462. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2712329. Maximum sequence length: 2049, sample length: 2574 [default0]:Skipping sample id=2714113. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2725853. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2732640. Maximum sequence length: 2049, sample length: 2728 [default0]:Skipping sample id=2746976. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2736359. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2729952. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2711538. Maximum sequence length: 2049, sample length: 3643 [default0]:Skipping sample id=2720546. Maximum sequence length: 2049, sample length: 4806 [default0]:Skipping sample id=2737283. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2731825. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2723495. Maximum sequence length: 2049, sample length: 2580 [default0]:Skipping sample id=2734414. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2738121. Maximum sequence length: 2049, sample length: 4853 [default0]:Skipping sample id=2755430. Maximum sequence length: 2049, sample length: 3817 [default0]:Skipping sample id=2754248. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2751482. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2727562. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2498904. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2747706. Maximum sequence length: 2049, sample length: 3082 [default0]:Skipping sample id=2723816. Maximum sequence length: 2049, sample length: 2961 [default0]:Skipping sample id=2738434. Maximum sequence length: 2049, sample length: 3935 [default0]:Skipping sample id=2488019. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2731841. Maximum sequence length: 2049, sample length: 4428 [default0]:Skipping sample id=2746525. Maximum sequence length: 2049, sample length: 3790 [default0]:Skipping sample id=2725601. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2483764. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2748000. Maximum sequence length: 2049, sample length: 3939 [default0]:Skipping sample id=2719614. Maximum sequence length: 2049, sample length: 4424 [default0]:Skipping sample id=2747244. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2724830. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2720392. Maximum sequence length: 2049, sample length: 3639 [default0]:Skipping sample id=2749642. Maximum sequence length: 2049, sample length: 4002 [default0]:Skipping sample id=2728720. Maximum sequence length: 2049, sample length: 3971 [default0]:Skipping sample id=2734587. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2467563. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2715477. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2713789. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2742888. Maximum sequence length: 2049, sample length: 5316 [default0]:Skipping sample id=2748694. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2481959. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2755663. Maximum sequence length: 2049, sample length: 4044 [default0]:Skipping sample id=2730923. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2728640. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2756958. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2735909. Maximum sequence length: 2049, sample length: 2824 [default0]:Skipping sample id=2724172. Maximum sequence length: 2049, sample length: 2585 [default0]:Skipping sample id=2748973. Maximum sequence length: 2049, sample length: 3127 [default0]:Skipping sample id=2734236. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2713351. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2718857. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2745588. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2486052. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2744837. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2737507. Maximum sequence length: 2049, sample length: 3517 [default0]:Skipping sample id=2713903. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2748115. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2746198. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2747463. Maximum sequence length: 2049, sample length: 4844 [default0]:Skipping sample id=2742692. Maximum sequence length: 2049, sample length: 3612 [default0]:Skipping sample id=2742840. Maximum sequence length: 2049, sample length: 6256 [default0]:Skipping sample id=2742637. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2754963. Maximum sequence length: 2049, sample length: 3470 [default0]:Skipping sample id=2725747. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2726204. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2748629. Maximum sequence length: 2049, sample length: 6108 [default0]:Skipping sample id=2731922. Maximum sequence length: 2049, sample length: 4042 [default0]:Skipping sample id=2717887. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2747398. Maximum sequence length: 2049, sample length: 3912 [default0]:Skipping sample id=2739312. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2722433. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2740183. Maximum sequence length: 2049, sample length: 4592 [default0]:Skipping sample id=2731632. Maximum sequence length: 2049, sample length: 3977 [default0]:Skipping sample id=2484204. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2485346. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2736640. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2493563. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2741941. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2730814. Maximum sequence length: 2049, sample length: 4438 [default0]:Skipping sample id=2724289. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2748950. Maximum sequence length: 2049, sample length: 4113 [default0]:Skipping sample id=2748470. Maximum sequence length: 2049, sample length: 3333 [default0]:Skipping sample id=2749852. Maximum sequence length: 2049, sample length: 5820 [default0]:Skipping sample id=2717277. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2494561. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2496585. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2745204. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2733576. Maximum sequence length: 2049, sample length: 3822 [default0]:Skipping sample id=2754725. Maximum sequence length: 2049, sample length: 4805 [default0]:Skipping sample id=2740297. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2725323. Maximum sequence length: 2049, sample length: 3976 [default0]:Skipping sample id=2753475. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2749982. Maximum sequence length: 2049, sample length: 3414 [default0]:Skipping sample id=2722427. Maximum sequence length: 2049, sample length: 3475 [default0]:Skipping sample id=2466019. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2728910. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2749484. Maximum sequence length: 2049, sample length: 4140 [default0]:Skipping sample id=2751619. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2755024. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2749474. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2742107. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2724038. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2742803. Maximum sequence length: 2049, sample length: 3327 [default0]:Skipping sample id=2751222. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2487875. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2747055. Maximum sequence length: 2049, sample length: 3107 [default0]:Skipping sample id=2755424. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2491606. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2711983. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2740121. Maximum sequence length: 2049, sample length: 3630 [default0]:Skipping sample id=2471032. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2724552. Maximum sequence length: 2049, sample length: 4336 [default0]:Skipping sample id=2733402. Maximum sequence length: 2049, sample length: 3736 [default0]:Skipping sample id=2723279. Maximum sequence length: 2049, sample length: 6416 [default0]:Skipping sample id=2743806. Maximum sequence length: 2049, sample length: 3432 [default0]:Skipping sample id=2756755. Maximum sequence length: 2049, sample length: 8038 [default0]:Skipping sample id=2741821. Maximum sequence length: 2049, sample length: 2747 [default0]:Skipping sample id=2732147. Maximum sequence length: 2049, sample length: 4178 [default0]:Skipping sample id=2747790. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2711435. Maximum sequence length: 2049, sample length: 3824 [default0]:Skipping sample id=2711145. Maximum sequence length: 2049, sample length: 5202 [default0]:Skipping sample id=2755414. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2722759. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2747751. Maximum sequence length: 2049, sample length: 4228 [default0]:Skipping sample id=2729985. Maximum sequence length: 2049, sample length: 2857 [default0]:Skipping sample id=2754907. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2738925. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2728035. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2753064. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2731331. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2494605. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2738234. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2715330. Maximum sequence length: 2049, sample length: 2585 [default0]:Skipping sample id=2720826. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2482929. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2728247. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2748300. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2741271. Maximum sequence length: 2049, sample length: 3387 [default0]:Skipping sample id=2718147. Maximum sequence length: 2049, sample length: 2925 [default0]:Skipping sample id=2466552. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2483031. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2713195. Maximum sequence length: 2049, sample length: 4628 [default0]:Skipping sample id=2729907. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2476982. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2731421. Maximum sequence length: 2049, sample length: 2705 [default0]:Skipping sample id=2744358. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2723382. Maximum sequence length: 2049, sample length: 4378 [default0]:Skipping sample id=2726807. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2477379. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2490343. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2733834. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2721074. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2745643. Maximum sequence length: 2049, sample length: 3478 [default0]:Skipping sample id=2715352. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2714425. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2717295. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2716189. Maximum sequence length: 2049, sample length: 4823 [default0]:Skipping sample id=2734946. Maximum sequence length: 2049, sample length: 2734 [default0]:Skipping sample id=2755136. Maximum sequence length: 2049, sample length: 4472 [default0]:Skipping sample id=2724575. Maximum sequence length: 2049, sample length: 2369 [default0]:Skipping sample id=2485017. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2732133. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2728120. Maximum sequence length: 2049, sample length: 3175 [default0]:Skipping sample id=2716031. Maximum sequence length: 2049, sample length: 4268 [default0]:Skipping sample id=2747401. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2732721. Maximum sequence length: 2049, sample length: 4739 [default0]:Skipping sample id=2734398. Maximum sequence length: 2049, sample length: 4660 [default0]:Skipping sample id=2754093. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2750795. Maximum sequence length: 2049, sample length: 3511 [default0]:Skipping sample id=2728700. Maximum sequence length: 2049, sample length: 3722 [default0]:Skipping sample id=2747197. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2745455. Maximum sequence length: 2049, sample length: 3575 [default0]:Skipping sample id=2727351. Maximum sequence length: 2049, sample length: 4150 [default0]:Skipping sample id=2735479. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2731668. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2487965. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2494866. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2735047. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2743210. Maximum sequence length: 2049, sample length: 4321 [default0]:Skipping sample id=2725610. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2735294. Maximum sequence length: 2049, sample length: 3244 [default0]:Skipping sample id=2726194. Maximum sequence length: 2049, sample length: 4569 [default0]:Skipping sample id=2467620. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2732617. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2467261. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2719816. Maximum sequence length: 2049, sample length: 3250 [default0]:Skipping sample id=2720478. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2726843. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2742243. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2732275. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2719932. Maximum sequence length: 2049, sample length: 2703 [default0]:Skipping sample id=2733105. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2753918. Maximum sequence length: 2049, sample length: 3514 [default0]:Skipping sample id=2747309. Maximum sequence length: 2049, sample length: 3970 [default0]:Skipping sample id=2730227. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2744629. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2714710. Maximum sequence length: 2049, sample length: 5522 [default0]:Skipping sample id=2711742. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2719717. Maximum sequence length: 2049, sample length: 3342 [default0]:Skipping sample id=2736775. Maximum sequence length: 2049, sample length: 2854 [default0]:Skipping sample id=2756058. Maximum sequence length: 2049, sample length: 3435 [default0]:Skipping sample id=2726514. Maximum sequence length: 2049, sample length: 3341 [default0]:Skipping sample id=2723189. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2733113. Maximum sequence length: 2049, sample length: 3218 [default0]:Skipping sample id=2711955. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2736931. Maximum sequence length: 2049, sample length: 4356 [default0]:Skipping sample id=2745263. Maximum sequence length: 2049, sample length: 7506 [default0]:Skipping sample id=2485044. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2482226. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2753100. Maximum sequence length: 2049, sample length: 3932 [default0]:Skipping sample id=2754315. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2727821. Maximum sequence length: 2049, sample length: 3280 [default0]:Skipping sample id=2716405. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2720522. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2495304. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2722994. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2725925. Maximum sequence length: 2049, sample length: 4235 [default0]:Skipping sample id=2486819. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2477738. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2753040. Maximum sequence length: 2049, sample length: 4443 [default0]:Skipping sample id=2719925. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2746417. Maximum sequence length: 2049, sample length: 3263 [default0]:Skipping sample id=2754590. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2743602. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2497867. Maximum sequence length: 2049, sample length: 3108 [default0]:Skipping sample id=2714081. Maximum sequence length: 2049, sample length: 3259 [default0]:Skipping sample id=2741730. Maximum sequence length: 2049, sample length: 3100 [default0]:Skipping sample id=2711916. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2724620. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2725652. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2726445. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2737679. Maximum sequence length: 2049, sample length: 7067 [default0]:Skipping sample id=2478441. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2723525. Maximum sequence length: 2049, sample length: 4967 [default0]:Skipping sample id=2469599. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2750074. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2730539. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2470799. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2484572. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2496116. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2749020. Maximum sequence length: 2049, sample length: 6442 [default0]:Skipping sample id=2738351. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2727790. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2714121. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2722056. Maximum sequence length: 2049, sample length: 2843 [default0]:Skipping sample id=2719491. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2467357. Maximum sequence length: 2049, sample length: 3620 [default0]:Skipping sample id=2482334. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2720285. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2739111. Maximum sequence length: 2049, sample length: 3601 [default0]:Skipping sample id=2755658. Maximum sequence length: 2049, sample length: 3000 [default0]:Skipping sample id=2734510. Maximum sequence length: 2049, sample length: 8496 [default0]:Skipping sample id=2725253. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2732845. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2751059. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2754335. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2739412. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2719962. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2715540. Maximum sequence length: 2049, sample length: 4072 [default0]:Skipping sample id=2486197. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2732308. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2716389. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2713672. Maximum sequence length: 2049, sample length: 2545 [default0]:Skipping sample id=2714112. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2721058. Maximum sequence length: 2049, sample length: 4988 [default0]:Skipping sample id=2753590. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2745541. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2724287. Maximum sequence length: 2049, sample length: 3311 [default0]:Skipping sample id=2754582. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2721143. Maximum sequence length: 2049, sample length: 3946 [default0]:Skipping sample id=2734068. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2723282. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2736434. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2713902. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2722199. Maximum sequence length: 2049, sample length: 5327 [default0]:Skipping sample id=2754703. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2728501. Maximum sequence length: 2049, sample length: 2796 [default0]:Skipping sample id=2721044. Maximum sequence length: 2049, sample length: 4073 [default0]:Skipping sample id=2738183. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2735199. Maximum sequence length: 2049, sample length: 4199 [default0]:Skipping sample id=2725797. Maximum sequence length: 2049, sample length: 3766 [default0]:Skipping sample id=2737157. Maximum sequence length: 2049, sample length: 3029 [default0]:Skipping sample id=2489318. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2745386. Maximum sequence length: 2049, sample length: 3296 [default0]:Skipping sample id=2713939. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2722872. Maximum sequence length: 2049, sample length: 3015 [default0]:Skipping sample id=2738308. Maximum sequence length: 2049, sample length: 3696 [default0]:Skipping sample id=2742662. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2724794. Maximum sequence length: 2049, sample length: 3561 [default0]:Skipping sample id=2716961. Maximum sequence length: 2049, sample length: 5644 [default0]:Skipping sample id=2721133. Maximum sequence length: 2049, sample length: 4214 [default0]:Skipping sample id=2718618. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2754285. Maximum sequence length: 2049, sample length: 2853 [default0]:Skipping sample id=2755054. Maximum sequence length: 2049, sample length: 3459 [default0]:Skipping sample id=2725677. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2756003. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2749244. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2743535. Maximum sequence length: 2049, sample length: 3569 [default0]:Skipping sample id=2716913. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2729525. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2489875. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2729327. Maximum sequence length: 2049, sample length: 2573 [default0]:Skipping sample id=2721214. Maximum sequence length: 2049, sample length: 3201 [default0]:Skipping sample id=2748051. Maximum sequence length: 2049, sample length: 2766 [default0]:Skipping sample id=2489570. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2744582. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2732883. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2487329. Maximum sequence length: 2049, sample length: 3118 [default0]:Skipping sample id=2729785. Maximum sequence length: 2049, sample length: 3947 [default0]:Skipping sample id=2714847. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2726432. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2756915. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2742985. Maximum sequence length: 2049, sample length: 5171 [default0]:Skipping sample id=2494421. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2496382. Maximum sequence length: 2049, sample length: 2585 [default0]:Skipping sample id=2745866. Maximum sequence length: 2049, sample length: 6141 [default0]:Skipping sample id=2723420. Maximum sequence length: 2049, sample length: 2683 [default0]:Skipping sample id=2495366. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2716721. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2750534. Maximum sequence length: 2049, sample length: 4045 [default0]:Skipping sample id=2479395. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2723136. Maximum sequence length: 2049, sample length: 3110 [default0]:Skipping sample id=2711854. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2482244. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2713448. Maximum sequence length: 2049, sample length: 3369 [default0]:Skipping sample id=2741065. Maximum sequence length: 2049, sample length: 4497 [default0]:Skipping sample id=2722942. Maximum sequence length: 2049, sample length: 2771 [default0]:Skipping sample id=2745024. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2747534. Maximum sequence length: 2049, sample length: 4136 [default0]:Skipping sample id=2734031. Maximum sequence length: 2049, sample length: 5248 [default0]:Skipping sample id=2721173. Maximum sequence length: 2049, sample length: 3573 [default0]:Skipping sample id=2727892. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2742490. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2751872. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2751218. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2746823. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2713909. Maximum sequence length: 2049, sample length: 3612 [default0]:Skipping sample id=2747908. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2713024. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2750912. Maximum sequence length: 2049, sample length: 3083 [default0]:Skipping sample id=2720053. Maximum sequence length: 2049, sample length: 4597 [default0]:Skipping sample id=2722347. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2733559. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2721991. Maximum sequence length: 2049, sample length: 4517 [default0]:Skipping sample id=2740429. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2722818. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2492864. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2753455. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2715509. Maximum sequence length: 2049, sample length: 5163 [default0]:Skipping sample id=2489791. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2753033. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2727729. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2755274. Maximum sequence length: 2049, sample length: 2914 [default0]:Skipping sample id=2739897. Maximum sequence length: 2049, sample length: 4425 [default0]:Skipping sample id=2748761. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2734616. Maximum sequence length: 2049, sample length: 3877 [default0]:Skipping sample id=2738273. Maximum sequence length: 2049, sample length: 5153 [default0]:Skipping sample id=2718197. Maximum sequence length: 2049, sample length: 3990 [default0]:Skipping sample id=2747064. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2718710. Maximum sequence length: 2049, sample length: 4242 [default0]:Skipping sample id=2756393. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2716276. Maximum sequence length: 2049, sample length: 3931 [default0]:Skipping sample id=2725266. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2729705. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2740984. Maximum sequence length: 2049, sample length: 2960 [default0]:Skipping sample id=2738564. Maximum sequence length: 2049, sample length: 3641 [default0]:Skipping sample id=2720199. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2721588. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2726440. Maximum sequence length: 2049, sample length: 3768 [default0]:Skipping sample id=2718754. Maximum sequence length: 2049, sample length: 3905 [default0]:Skipping sample id=2732924. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2485398. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2712912. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2739520. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2724797. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2468589. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2742010. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2733633. Maximum sequence length: 2049, sample length: 6533 [default0]:Skipping sample id=2486652. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2717710. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2747002. Maximum sequence length: 2049, sample length: 2906 [default0]:Skipping sample id=2719466. Maximum sequence length: 2049, sample length: 5554 [default0]:Skipping sample id=2748147. Maximum sequence length: 2049, sample length: 2836 [default0]:Skipping sample id=2730097. Maximum sequence length: 2049, sample length: 5512 [default0]:Skipping sample id=2724277. Maximum sequence length: 2049, sample length: 3977 [default0]:Skipping sample id=2740758. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2483388. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2713803. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2497708. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2756384. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2466906. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2727672. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2735197. Maximum sequence length: 2049, sample length: 3686 [default0]:Skipping sample id=2727453. Maximum sequence length: 2049, sample length: 6428 [default0]:Skipping sample id=2469332. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2718599. Maximum sequence length: 2049, sample length: 6151 [default0]:Skipping sample id=2727147. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2720188. Maximum sequence length: 2049, sample length: 5087 [default0]:Skipping sample id=2757064. Maximum sequence length: 2049, sample length: 3912 [default0]:Skipping sample id=2738064. Maximum sequence length: 2049, sample length: 3558 [default0]:Skipping sample id=2717163. Maximum sequence length: 2049, sample length: 7271 [default0]:Skipping sample id=2721010. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2488623. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2717874. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2480101. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2727401. Maximum sequence length: 2049, sample length: 3437 [default0]:Skipping sample id=2751988. Maximum sequence length: 2049, sample length: 3107 [default0]:Skipping sample id=2751295. Maximum sequence length: 2049, sample length: 3378 [default0]:Skipping sample id=2731376. Maximum sequence length: 2049, sample length: 2881 [default0]:Skipping sample id=2489502. Maximum sequence length: 2049, sample length: 3343 [default0]:Skipping sample id=2753829. Maximum sequence length: 2049, sample length: 4773 [default0]:Skipping sample id=2484390. Maximum sequence length: 2049, sample length: 2672 [default0]:Skipping sample id=2714614. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2489509. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2729390. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2720057. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2718541. Maximum sequence length: 2049, sample length: 3026 [default0]:Skipping sample id=2726296. Maximum sequence length: 2049, sample length: 3626 [default0]:Skipping sample id=2721162. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2750337. Maximum sequence length: 2049, sample length: 2584 [default0]:Skipping sample id=2749461. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2717724. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2466264. Maximum sequence length: 2049, sample length: 2803 [default0]:Skipping sample id=2719451. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2741124. Maximum sequence length: 2049, sample length: 3411 [default0]:Skipping sample id=2735760. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2751106. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2739370. Maximum sequence length: 2049, sample length: 3638 [default0]:Skipping sample id=2714612. Maximum sequence length: 2049, sample length: 3400 [default0]:Skipping sample id=2498485. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2713769. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2756823. Maximum sequence length: 2049, sample length: 3130 [default0]:Skipping sample id=2733801. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2747407. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2490306. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2743724. Maximum sequence length: 2049, sample length: 5271 [default0]:Skipping sample id=2737864. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2735278. Maximum sequence length: 2049, sample length: 2906 [default0]:Skipping sample id=2722478. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2721265. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2738691. Maximum sequence length: 2049, sample length: 3945 [default0]:Skipping sample id=2746698. Maximum sequence length: 2049, sample length: 3096 [default0]:Skipping sample id=2752837. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2717967. Maximum sequence length: 2049, sample length: 3484 [default0]:Skipping sample id=2721476. Maximum sequence length: 2049, sample length: 2960 [default0]:Skipping sample id=2738826. Maximum sequence length: 2049, sample length: 4250 [default0]:Skipping sample id=2731438. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2721158. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2711126. Maximum sequence length: 2049, sample length: 5938 [default0]:Skipping sample id=2747935. Maximum sequence length: 2049, sample length: 2521 [default0]:Skipping sample id=2722212. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2737698. Maximum sequence length: 2049, sample length: 3022 [default0]:Skipping sample id=2745210. Maximum sequence length: 2049, sample length: 4554 [default0]:Skipping sample id=2719208. Maximum sequence length: 2049, sample length: 3664 [default0]:Skipping sample id=2712506. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2752269. Maximum sequence length: 2049, sample length: 3158 [default0]:Skipping sample id=2721167. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2751553. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2722735. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2723642. Maximum sequence length: 2049, sample length: 4480 [default0]:Skipping sample id=2715175. Maximum sequence length: 2049, sample length: 3060 [default0]:Skipping sample id=2744054. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2713546. Maximum sequence length: 2049, sample length: 3037 [default0]:Skipping sample id=2741774. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2714392. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2499309. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2724839. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2725574. Maximum sequence length: 2049, sample length: 4189 [default0]:Skipping sample id=2715662. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2723606. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2723938. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2723008. Maximum sequence length: 2049, sample length: 3180 [default0]:Skipping sample id=2727708. Maximum sequence length: 2049, sample length: 6482 [default0]:Skipping sample id=2727229. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2488769. Maximum sequence length: 2049, sample length: 3027 [default0]:Skipping sample id=2722106. Maximum sequence length: 2049, sample length: 4388 [default0]:Skipping sample id=2752950. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2740585. Maximum sequence length: 2049, sample length: 4331 [default0]:Skipping sample id=2481584. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2739377. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2498263. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2756946. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2741648. Maximum sequence length: 2049, sample length: 3149 [default0]:Skipping sample id=2481388. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2738043. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2745413. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2733454. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2750026. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2731586. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2747373. Maximum sequence length: 2049, sample length: 4072 [default0]:Skipping sample id=2721465. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2721403. Maximum sequence length: 2049, sample length: 5209 [default0]:Skipping sample id=2735881. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2739806. Maximum sequence length: 2049, sample length: 2703 [default0]:Skipping sample id=2743860. Maximum sequence length: 2049, sample length: 3532 [default0]:Skipping sample id=2741236. Maximum sequence length: 2049, sample length: 4167 [default0]:Skipping sample id=2734433. Maximum sequence length: 2049, sample length: 3989 [default0]:Skipping sample id=2734375. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2744959. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2735956. Maximum sequence length: 2049, sample length: 3742 [default0]:Skipping sample id=2716123. Maximum sequence length: 2049, sample length: 3311 [default0]:Skipping sample id=2721463. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2757063. Maximum sequence length: 2049, sample length: 4139 [default0]:Skipping sample id=2725793. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2719241. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2723377. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2717443. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2722747. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2718067. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2737347. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2724941. Maximum sequence length: 2049, sample length: 4216 [default0]:Skipping sample id=2718410. Maximum sequence length: 2049, sample length: 3913 [default0]:Skipping sample id=2717308. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2743254. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2715717. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2734768. Maximum sequence length: 2049, sample length: 4527 [default0]:Skipping sample id=2717203. Maximum sequence length: 2049, sample length: 3750 [default0]:Skipping sample id=2711817. Maximum sequence length: 2049, sample length: 3909 [default0]:Skipping sample id=2726676. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2467616. Maximum sequence length: 2049, sample length: 3306 [default0]:Skipping sample id=2737606. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2717813. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2494632. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2480002. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2482860. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2717143. Maximum sequence length: 2049, sample length: 3447 [default0]:Skipping sample id=2734878. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2755724. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2477238. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2731958. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2718470. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2738510. Maximum sequence length: 2049, sample length: 2994 [default0]:Skipping sample id=2714284. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2748095. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2491284. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2741696. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2752876. Maximum sequence length: 2049, sample length: 2601 [default0]:Skipping sample id=2733494. Maximum sequence length: 2049, sample length: 3904 [default0]:Skipping sample id=2754941. Maximum sequence length: 2049, sample length: 3461 [default0]:Skipping sample id=2714675. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2752496. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2754232. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2730185. Maximum sequence length: 2049, sample length: 5225 [default0]:Skipping sample id=2753743. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2744166. Maximum sequence length: 2049, sample length: 3195 [default0]:Skipping sample id=2719678. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2751931. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2756415. Maximum sequence length: 2049, sample length: 3414 [default0]:Skipping sample id=2745072. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2710983. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2736530. Maximum sequence length: 2049, sample length: 3936 [default0]:Skipping sample id=2752977. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2740335. Maximum sequence length: 2049, sample length: 3732 [default0]:Skipping sample id=2483111. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2754704. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2745034. Maximum sequence length: 2049, sample length: 3408 [default0]:Skipping sample id=2739010. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2728125. Maximum sequence length: 2049, sample length: 3063 [default0]:Skipping sample id=2729786. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2719994. Maximum sequence length: 2049, sample length: 3880 [default0]:Skipping sample id=2728441. Maximum sequence length: 2049, sample length: 3288 [default0]:Skipping sample id=2735202. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2732978. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2755287. Maximum sequence length: 2049, sample length: 4094 [default0]:Skipping sample id=2730943. Maximum sequence length: 2049, sample length: 3429 [default0]:Skipping sample id=2736504. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2484082. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2713888. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2736211. Maximum sequence length: 2049, sample length: 3950 [default0]:Skipping sample id=2746019. Maximum sequence length: 2049, sample length: 3613 [default0]:Skipping sample id=2736688. Maximum sequence length: 2049, sample length: 4123 [default0]:Skipping sample id=2731289. Maximum sequence length: 2049, sample length: 3573 [default0]:Skipping sample id=2733881. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2735236. Maximum sequence length: 2049, sample length: 2850 [default0]:Skipping sample id=2754750. Maximum sequence length: 2049, sample length: 4500 [default0]:Skipping sample id=2716894. Maximum sequence length: 2049, sample length: 2768 [default0]:Skipping sample id=2742045. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2724298. Maximum sequence length: 2049, sample length: 3722 [default0]:Skipping sample id=2714940. Maximum sequence length: 2049, sample length: 4535 [default0]:Skipping sample id=2730169. Maximum sequence length: 2049, sample length: 3694 [default0]:Skipping sample id=2491076. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2736224. Maximum sequence length: 2049, sample length: 3129 [default0]:Skipping sample id=2730976. Maximum sequence length: 2049, sample length: 4726 [default0]:Skipping sample id=2736868. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2720475. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2751742. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2731116. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2732270. Maximum sequence length: 2049, sample length: 2753 [default0]:Skipping sample id=2722642. Maximum sequence length: 2049, sample length: 3772 [default0]:Skipping sample id=2750333. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2730375. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2727572. Maximum sequence length: 2049, sample length: 3380 [default0]:Skipping sample id=2715779. Maximum sequence length: 2049, sample length: 3306 [default0]:Skipping sample id=2742146. Maximum sequence length: 2049, sample length: 4620 [default0]:Skipping sample id=2728365. Maximum sequence length: 2049, sample length: 3905 [default0]:Skipping sample id=2740351. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2490636. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2720477. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2725635. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2494929. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2721057. Maximum sequence length: 2049, sample length: 4567 [default0]:Skipping sample id=2743345. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2747077. Maximum sequence length: 2049, sample length: 3526 [default0]:Skipping sample id=2715924. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2723195. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2714594. Maximum sequence length: 2049, sample length: 3119 [default0]:Skipping sample id=2717568. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2743788. Maximum sequence length: 2049, sample length: 3301 [default0]:Skipping sample id=2714003. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2733785. Maximum sequence length: 2049, sample length: 3471 [default0]:Skipping sample id=2720672. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2482023. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2737260. Maximum sequence length: 2049, sample length: 3823 [default0]:Skipping sample id=2719026. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2757069. Maximum sequence length: 2049, sample length: 3212 [default0]:Skipping sample id=2715273. Maximum sequence length: 2049, sample length: 3305 [default0]:Skipping sample id=2749752. Maximum sequence length: 2049, sample length: 2598 [default0]:Skipping sample id=2738898. Maximum sequence length: 2049, sample length: 2846 [default0]:Skipping sample id=2737662. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2732359. Maximum sequence length: 2049, sample length: 3765 [default0]:Skipping sample id=2733005. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2718356. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2745548. Maximum sequence length: 2049, sample length: 3313 [default0]:Skipping sample id=2740569. Maximum sequence length: 2049, sample length: 4530 [default0]:Skipping sample id=2730926. Maximum sequence length: 2049, sample length: 3749 [default0]:Skipping sample id=2717121. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2739374. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2712589. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2721161. Maximum sequence length: 2049, sample length: 4134 [default0]:Skipping sample id=2718494. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2725686. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2737450. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2749975. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2731072. Maximum sequence length: 2049, sample length: 6335 [default0]:Skipping sample id=2744019. Maximum sequence length: 2049, sample length: 3469 [default0]:Skipping sample id=2738583. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2494355. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2739171. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2495732. Maximum sequence length: 2049, sample length: 2489 [default0]:Skipping sample id=2754527. Maximum sequence length: 2049, sample length: 3547 [default0]:Skipping sample id=2741436. Maximum sequence length: 2049, sample length: 3124 [default0]:Skipping sample id=2747738. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2483748. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2711225. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2713910. Maximum sequence length: 2049, sample length: 4091 [default0]:Skipping sample id=2728324. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2739060. Maximum sequence length: 2049, sample length: 3130 [default0]:Skipping sample id=2487324. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2740909. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2721621. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2742966. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2487487. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2720685. Maximum sequence length: 2049, sample length: 3439 [default0]:Skipping sample id=2488506. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2740253. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2718178. Maximum sequence length: 2049, sample length: 3942 [default0]:Skipping sample id=2739941. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2498254. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2742616. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2747441. Maximum sequence length: 2049, sample length: 3429 [default0]:Skipping sample id=2726730. Maximum sequence length: 2049, sample length: 3835 [default0]:Skipping sample id=2729895. Maximum sequence length: 2049, sample length: 4416 [default0]:Skipping sample id=2730735. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2489384. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2470045. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2727785. Maximum sequence length: 2049, sample length: 3253 [default0]:Skipping sample id=2752556. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2734030. Maximum sequence length: 2049, sample length: 2788 [default0]:Skipping sample id=2488395. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2756956. Maximum sequence length: 2049, sample length: 3102 [default0]:Skipping sample id=2754780. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2737378. Maximum sequence length: 2049, sample length: 4306 [default0]:Skipping sample id=2751147. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2750846. Maximum sequence length: 2049, sample length: 3358 [default0]:Skipping sample id=2753645. Maximum sequence length: 2049, sample length: 4347 [default0]:Skipping sample id=2753629. Maximum sequence length: 2049, sample length: 3160 [default0]:Skipping sample id=2749378. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2736462. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2495192. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2736295. Maximum sequence length: 2049, sample length: 3296 [default0]:Skipping sample id=2742864. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2713216. Maximum sequence length: 2049, sample length: 3468 [default0]:Skipping sample id=2495307. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2734015. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2749767. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2726552. Maximum sequence length: 2049, sample length: 5816 [default0]:Skipping sample id=2729380. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2737803. Maximum sequence length: 2049, sample length: 3695 [default0]:Skipping sample id=2735101. Maximum sequence length: 2049, sample length: 2989 [default0]:Skipping sample id=2738239. Maximum sequence length: 2049, sample length: 2780 [default0]:Skipping sample id=2721340. Maximum sequence length: 2049, sample length: 2502 [default0]:Skipping sample id=2494330. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2746326. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2745577. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2496067. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2470534. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2716193. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2495696. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2746193. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2479996. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2752239. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2722443. Maximum sequence length: 2049, sample length: 3397 [default0]:Skipping sample id=2713801. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2717948. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2712317. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2724702. Maximum sequence length: 2049, sample length: 3028 [default0]:Skipping sample id=2732329. Maximum sequence length: 2049, sample length: 7273 [default0]:Skipping sample id=2732280. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2467217. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2747932. Maximum sequence length: 2049, sample length: 3334 [default0]:Skipping sample id=2743765. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2482525. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2490070. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2497311. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2492758. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2734203. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2738232. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2742824. Maximum sequence length: 2049, sample length: 2835 [default0]:Skipping sample id=2717138. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2721540. Maximum sequence length: 2049, sample length: 2624 [default0]:Skipping sample id=2746721. Maximum sequence length: 2049, sample length: 4206 [default0]:Skipping sample id=2753782. Maximum sequence length: 2049, sample length: 3060 [default0]:Skipping sample id=2750264. Maximum sequence length: 2049, sample length: 2874 [default0]:Skipping sample id=2748106. Maximum sequence length: 2049, sample length: 5809 [default0]:Skipping sample id=2724907. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2747816. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2730311. Maximum sequence length: 2049, sample length: 4587 [default0]:Skipping sample id=2740801. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2466631. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2720646. Maximum sequence length: 2049, sample length: 5853 [default0]:Skipping sample id=2715068. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2746202. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2756565. Maximum sequence length: 2049, sample length: 4872 [default0]:Skipping sample id=2750121. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2490424. Maximum sequence length: 2049, sample length: 3009 [default0]:Skipping sample id=2755908. Maximum sequence length: 2049, sample length: 4248 [default0]:Skipping sample id=2724515. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2730142. Maximum sequence length: 2049, sample length: 2904 [default0]:Skipping sample id=2715657. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2714243. Maximum sequence length: 2049, sample length: 5207 [default0]:Skipping sample id=2477889. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2749873. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2741201. Maximum sequence length: 2049, sample length: 3341 [default0]:Skipping sample id=2736522. Maximum sequence length: 2049, sample length: 2421 [default0]:Skipping sample id=2717784. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2714287. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2741824. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2732222. Maximum sequence length: 2049, sample length: 3355 [default0]:Skipping sample id=2725603. Maximum sequence length: 2049, sample length: 2978 [default0]:Skipping sample id=2713374. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2482998. Maximum sequence length: 2049, sample length: 3167 [default0]:Skipping sample id=2713241. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2752075. Maximum sequence length: 2049, sample length: 4129 [default0]:Skipping sample id=2714720. Maximum sequence length: 2049, sample length: 2860 [default0]:Skipping sample id=2715405. Maximum sequence length: 2049, sample length: 2869 [default0]:Skipping sample id=2498615. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2722431. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2717473. Maximum sequence length: 2049, sample length: 4919 [default0]:Skipping sample id=2717950. Maximum sequence length: 2049, sample length: 4014 [default0]:Skipping sample id=2481909. Maximum sequence length: 2049, sample length: 2808 [default0]:Skipping sample id=2751329. Maximum sequence length: 2049, sample length: 3999 [default0]:Skipping sample id=2735629. Maximum sequence length: 2049, sample length: 2506 [default0]:Skipping sample id=2719654. Maximum sequence length: 2049, sample length: 3882 [default0]:Skipping sample id=2731049. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2467806. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2748586. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2749705. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2492532. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2735116. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2742193. Maximum sequence length: 2049, sample length: 6523 [default0]:Skipping sample id=2718738. Maximum sequence length: 2049, sample length: 3355 [default0]:Skipping sample id=2731084. Maximum sequence length: 2049, sample length: 2712 [default0]:Skipping sample id=2743176. Maximum sequence length: 2049, sample length: 2864 [default0]:Skipping sample id=2720517. Maximum sequence length: 2049, sample length: 6431 [default0]:Skipping sample id=2711256. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2730600. Maximum sequence length: 2049, sample length: 3823 [default0]:Skipping sample id=2752895. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2751973. Maximum sequence length: 2049, sample length: 3230 [default0]:Skipping sample id=2744279. Maximum sequence length: 2049, sample length: 3142 [default0]:Skipping sample id=2738216. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2722715. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2739596. Maximum sequence length: 2049, sample length: 4420 [default0]:Skipping sample id=2712186. Maximum sequence length: 2049, sample length: 7105 [default0]:Skipping sample id=2485521. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2740407. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2486731. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2479899. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2481998. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2721693. Maximum sequence length: 2049, sample length: 3831 [default0]:Skipping sample id=2756020. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2717359. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2729406. Maximum sequence length: 2049, sample length: 3960 [default0]:Skipping sample id=2741501. Maximum sequence length: 2049, sample length: 3327 [default0]:Skipping sample id=2733277. Maximum sequence length: 2049, sample length: 3686 [default0]:Skipping sample id=2742247. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2745786. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2725589. Maximum sequence length: 2049, sample length: 3561 [default0]:Skipping sample id=2725127. Maximum sequence length: 2049, sample length: 3682 [default0]:Skipping sample id=2746955. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2736206. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2746766. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2749805. Maximum sequence length: 2049, sample length: 2824 [default0]:Skipping sample id=2728476. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2733421. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2733321. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2738734. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2716964. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2729052. Maximum sequence length: 2049, sample length: 3626 [default0]:Skipping sample id=2741842. Maximum sequence length: 2049, sample length: 3798 [default0]:Skipping sample id=2496665. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2744453. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2726390. Maximum sequence length: 2049, sample length: 3793 [default0]:Skipping sample id=2736904. Maximum sequence length: 2049, sample length: 4357 [default0]:Skipping sample id=2727126. Maximum sequence length: 2049, sample length: 3420 [default0]:Skipping sample id=2719527. Maximum sequence length: 2049, sample length: 3957 [default0]:Skipping sample id=2756671. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2739423. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2481962. Maximum sequence length: 2049, sample length: 2892 [default0]:Skipping sample id=2735408. Maximum sequence length: 2049, sample length: 3578 [default0]:Skipping sample id=2712861. Maximum sequence length: 2049, sample length: 3137 [default0]:Skipping sample id=2734717. Maximum sequence length: 2049, sample length: 4693 [default0]:Skipping sample id=2713778. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2718736. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2736000. Maximum sequence length: 2049, sample length: 5383 [default0]:Skipping sample id=2494456. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2754532. Maximum sequence length: 2049, sample length: 4130 [default0]:Skipping sample id=2739650. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2716080. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2724678. Maximum sequence length: 2049, sample length: 2935 [default0]:Skipping sample id=2734445. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2466568. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2735108. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2718284. Maximum sequence length: 2049, sample length: 7210 [default0]:Skipping sample id=2740426. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2751801. Maximum sequence length: 2049, sample length: 4423 [default0]:Skipping sample id=2482539. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2737725. Maximum sequence length: 2049, sample length: 3080 [default0]:Skipping sample id=2724858. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2739134. Maximum sequence length: 2049, sample length: 3064 [default0]:Skipping sample id=2725678. Maximum sequence length: 2049, sample length: 4608 [default0]:Skipping sample id=2489594. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2719595. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2731308. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2715899. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2730656. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2714630. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2716354. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2731643. Maximum sequence length: 2049, sample length: 3341 [default0]:Skipping sample id=2490888. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2478350. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2734728. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2749135. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2747611. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2755857. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2712338. Maximum sequence length: 2049, sample length: 5747 [default0]:Skipping sample id=2726267. Maximum sequence length: 2049, sample length: 4900 [default0]:Skipping sample id=2713590. Maximum sequence length: 2049, sample length: 3980 [default0]:Skipping sample id=2748250. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2752604. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2719341. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2491020. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2716989. Maximum sequence length: 2049, sample length: 5135 [default0]:Skipping sample id=2731985. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2740477. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2724595. Maximum sequence length: 2049, sample length: 5561 [default0]:Skipping sample id=2752281. Maximum sequence length: 2049, sample length: 5003 [default0]:Skipping sample id=2720570. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2725794. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2751540. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2480178. Maximum sequence length: 2049, sample length: 3385 [default0]:Skipping sample id=2751268. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2477848. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2736428. Maximum sequence length: 2049, sample length: 2986 [default0]:Skipping sample id=2729911. Maximum sequence length: 2049, sample length: 3550 [default0]:Skipping sample id=2724821. Maximum sequence length: 2049, sample length: 3094 [default0]:Skipping sample id=2479274. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2731723. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2748478. Maximum sequence length: 2049, sample length: 2683 [default0]:Skipping sample id=2717642. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2752488. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2711363. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2755035. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2726863. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2729514. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2478761. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2717067. Maximum sequence length: 2049, sample length: 2900 [default0]:Skipping sample id=2728879. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2737609. Maximum sequence length: 2049, sample length: 3945 [default0]:Skipping sample id=2492992. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2756749. Maximum sequence length: 2049, sample length: 3477 [default0]:Skipping sample id=2724531. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2744466. Maximum sequence length: 2049, sample length: 4920 [default0]:Skipping sample id=2717028. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2756371. Maximum sequence length: 2049, sample length: 2965 [default0]:Skipping sample id=2743175. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2727789. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2747956. Maximum sequence length: 2049, sample length: 3029 [default0]:Skipping sample id=2736619. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2750835. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2733922. Maximum sequence length: 2049, sample length: 2987 [default0]:Skipping sample id=2477525. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2741822. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2721734. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2740095. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2751875. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2735486. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2737537. Maximum sequence length: 2049, sample length: 2870 [default0]:Skipping sample id=2733110. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2721239. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2719729. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2751980. Maximum sequence length: 2049, sample length: 2574 [default0]:Skipping sample id=2738114. Maximum sequence length: 2049, sample length: 4619 [default0]:Skipping sample id=2748853. Maximum sequence length: 2049, sample length: 5278 [default0]:Skipping sample id=2729816. Maximum sequence length: 2049, sample length: 6455 [default0]:Skipping sample id=2719810. Maximum sequence length: 2049, sample length: 3696 [default0]:Skipping sample id=2734366. Maximum sequence length: 2049, sample length: 3172 [default0]:Skipping sample id=2747886. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2728292. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2733054. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2732682. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2748957. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2727601. Maximum sequence length: 2049, sample length: 3107 [default0]:Skipping sample id=2749666. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2734353. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2735905. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2752258. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2754498. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2754714. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2719121. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2712827. Maximum sequence length: 2049, sample length: 2708 [default0]:Skipping sample id=2715723. Maximum sequence length: 2049, sample length: 8161 [default0]:Skipping sample id=2733180. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2751954. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2738205. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2725974. Maximum sequence length: 2049, sample length: 3411 [default0]:Skipping sample id=2738040. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2749201. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2749435. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2466094. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2470780. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2478045. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2755289. Maximum sequence length: 2049, sample length: 2234 [default0]:Skipping sample id=2737268. Maximum sequence length: 2049, sample length: 3316 [default0]:Skipping sample id=2721306. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2492990. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2484905. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2751206. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2736690. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2752821. Maximum sequence length: 2049, sample length: 3501 [default0]:Skipping sample id=2729176. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2713696. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2745003. Maximum sequence length: 2049, sample length: 3959 [default0]:Skipping sample id=2716009. Maximum sequence length: 2049, sample length: 4893 [default0]:Skipping sample id=2711760. Maximum sequence length: 2049, sample length: 2489 [default0]:Skipping sample id=2721692. Maximum sequence length: 2049, sample length: 2717 [default0]:Skipping sample id=2731679. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2715646. Maximum sequence length: 2049, sample length: 2232 [default0]:Skipping sample id=2742208. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2719884. Maximum sequence length: 2049, sample length: 2914 [default0]:Skipping sample id=2717310. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2735858. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2728613. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2734309. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2718442. Maximum sequence length: 2049, sample length: 3732 [default0]:Skipping sample id=2722882. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2735169. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2754881. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2732868. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2746368. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2723361. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2753870. Maximum sequence length: 2049, sample length: 3342 [default0]:Skipping sample id=2724339. Maximum sequence length: 2049, sample length: 3266 [default0]:Skipping sample id=2732427. Maximum sequence length: 2049, sample length: 4704 [default0]:Skipping sample id=2727246. Maximum sequence length: 2049, sample length: 2987 [default0]:Skipping sample id=2725099. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2726156. Maximum sequence length: 2049, sample length: 4546 [default0]:Skipping sample id=2730327. Maximum sequence length: 2049, sample length: 4525 [default0]:Skipping sample id=2729729. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2728405. Maximum sequence length: 2049, sample length: 2915 [default0]:Skipping sample id=2493017. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2740083. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2749647. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2743970. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2727630. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2732303. Maximum sequence length: 2049, sample length: 6924 [default0]:Skipping sample id=2722050. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2718892. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2730921. Maximum sequence length: 2049, sample length: 3917 [default0]:Skipping sample id=2726493. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2751195. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2711716. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2727459. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2752697. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2754924. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2742618. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2746144. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2743199. Maximum sequence length: 2049, sample length: 3019 [default0]:Skipping sample id=2754956. Maximum sequence length: 2049, sample length: 3294 [default0]:Skipping sample id=2741535. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2718675. Maximum sequence length: 2049, sample length: 3037 [default0]:Skipping sample id=2482946. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2733091. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2496155. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2715716. Maximum sequence length: 2049, sample length: 3404 [default0]:Skipping sample id=2752835. Maximum sequence length: 2049, sample length: 3080 [default0]:Skipping sample id=2724938. Maximum sequence length: 2049, sample length: 4012 [default0]:Skipping sample id=2470505. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2467113. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2721560. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2748062. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2722720. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2754382. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2470716. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2750948. Maximum sequence length: 2049, sample length: 4375 [default0]:Skipping sample id=2742115. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2720307. Maximum sequence length: 2049, sample length: 5990 [default0]:Skipping sample id=2737587. Maximum sequence length: 2049, sample length: 3114 [default0]:Skipping sample id=2481071. Maximum sequence length: 2049, sample length: 4272 [default0]:Skipping sample id=2720124. Maximum sequence length: 2049, sample length: 2875 [default0]:Skipping sample id=2741922. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2752841. Maximum sequence length: 2049, sample length: 3420 [default0]:Skipping sample id=2738995. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2753349. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2497140. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2716205. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2726627. Maximum sequence length: 2049, sample length: 2955 [default0]:Skipping sample id=2742323. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2735168. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2713624. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2480549. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2738383. Maximum sequence length: 2049, sample length: 2990 [default0]:Skipping sample id=2728433. Maximum sequence length: 2049, sample length: 4078 [default0]:Skipping sample id=2739773. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2733846. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2724077. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2744525. Maximum sequence length: 2049, sample length: 5552 [default0]:Skipping sample id=2746433. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2719151. Maximum sequence length: 2049, sample length: 3606 [default0]:Skipping sample id=2743932. Maximum sequence length: 2049, sample length: 2853 [default0]:Skipping sample id=2727976. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2726153. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2735187. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2713703. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2478532. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2721987. Maximum sequence length: 2049, sample length: 3514 [default0]:Skipping sample id=2745698. Maximum sequence length: 2049, sample length: 4531 [default0]:Skipping sample id=2477273. Maximum sequence length: 2049, sample length: 3071 [default0]:Skipping sample id=2735665. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2488620. Maximum sequence length: 2049, sample length: 3158 [default0]:Skipping sample id=2728523. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2722266. Maximum sequence length: 2049, sample length: 4578 [default0]:Skipping sample id=2466312. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2482919. Maximum sequence length: 2049, sample length: 2676 [default0]:Skipping sample id=2716132. Maximum sequence length: 2049, sample length: 3803 [default0]:Skipping sample id=2747961. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2754740. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2497594. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2731347. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2744140. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2491043. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2726009. Maximum sequence length: 2049, sample length: 4024 [default0]:Skipping sample id=2753462. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2481533. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2477811. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2731839. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2746606. Maximum sequence length: 2049, sample length: 2941 [default0]:Skipping sample id=2714270. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2755677. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2739744. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2717202. Maximum sequence length: 2049, sample length: 3209 [default0]:Skipping sample id=2739888. Maximum sequence length: 2049, sample length: 4949 [default0]:Skipping sample id=2730964. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2750915. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2733845. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2483514. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2714440. Maximum sequence length: 2049, sample length: 6863 [default0]:Skipping sample id=2745291. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2717248. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2722261. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2742382. Maximum sequence length: 2049, sample length: 3301 [default0]:Skipping sample id=2733222. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2713742. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2749775. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2715586. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2736474. Maximum sequence length: 2049, sample length: 3073 [default0]:Skipping sample id=2466746. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2740767. Maximum sequence length: 2049, sample length: 3440 [default0]:Skipping sample id=2756857. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2732278. Maximum sequence length: 2049, sample length: 2598 [default0]:Skipping sample id=2750508. Maximum sequence length: 2049, sample length: 2945 [default0]:Skipping sample id=2711245. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2728098. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2487141. Maximum sequence length: 2049, sample length: 2557 [default0]:Skipping sample id=2721660. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2718886. Maximum sequence length: 2049, sample length: 5583 [default0]:Skipping sample id=2488833. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2756464. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2730717. Maximum sequence length: 2049, sample length: 2575 [default0]:Skipping sample id=2736391. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2754765. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2718066. Maximum sequence length: 2049, sample length: 3209 [default0]:Skipping sample id=2734301. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2742690. Maximum sequence length: 2049, sample length: 3003 [default0]:Skipping sample id=2740138. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2711099. Maximum sequence length: 2049, sample length: 3039 [default0]:Skipping sample id=2756816. Maximum sequence length: 2049, sample length: 4315 [default0]:Skipping sample id=2734108. Maximum sequence length: 2049, sample length: 3248 [default0]:Skipping sample id=2723413. Maximum sequence length: 2049, sample length: 7217 [default0]:Skipping sample id=2722061. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2488935. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2728198. Maximum sequence length: 2049, sample length: 2865 [default0]:Skipping sample id=2729872. Maximum sequence length: 2049, sample length: 4414 [default0]:Skipping sample id=2724769. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2732517. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2720208. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2467198. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2498232. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2737932. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2730940. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2757097. Maximum sequence length: 2049, sample length: 5872 [default0]:Skipping sample id=2729858. Maximum sequence length: 2049, sample length: 3603 [default0]:Skipping sample id=2745843. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2719765. Maximum sequence length: 2049, sample length: 3811 [default0]:Skipping sample id=2721499. Maximum sequence length: 2049, sample length: 3317 [default0]:Skipping sample id=2735165. Maximum sequence length: 2049, sample length: 2468 [default0]:Skipping sample id=2756382. Maximum sequence length: 2049, sample length: 4368 [default0]:Skipping sample id=2728384. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2482360. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2741694. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2722042. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2733261. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2718816. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2719511. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2755806. Maximum sequence length: 2049, sample length: 3318 [default0]:Skipping sample id=2737432. Maximum sequence length: 2049, sample length: 6759 [default0]:Skipping sample id=2738402. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2745471. Maximum sequence length: 2049, sample length: 4357 [default0]:Skipping sample id=2745619. Maximum sequence length: 2049, sample length: 3573 [default0]:Skipping sample id=2755581. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2484169. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2719778. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2747865. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2739424. Maximum sequence length: 2049, sample length: 5021 [default0]:Skipping sample id=2712649. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2714120. Maximum sequence length: 2049, sample length: 3335 [default0]:Skipping sample id=2714436. Maximum sequence length: 2049, sample length: 4719 [default0]:Skipping sample id=2752518. Maximum sequence length: 2049, sample length: 5018 [default0]:Skipping sample id=2751549. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2719142. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2711809. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2729588. Maximum sequence length: 2049, sample length: 4693 [default0]:Skipping sample id=2468394. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2714779. Maximum sequence length: 2049, sample length: 4803 [default0]:Skipping sample id=2717865. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2739549. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2744522. Maximum sequence length: 2049, sample length: 2430 [default0]:Skipping sample id=2742173. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2750475. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2721972. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2726698. Maximum sequence length: 2049, sample length: 2798 [default0]:Skipping sample id=2712679. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2735551. Maximum sequence length: 2049, sample length: 3229 [default0]:Skipping sample id=2756633. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2484279. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2730660. Maximum sequence length: 2049, sample length: 3295 [default0]:Skipping sample id=2725414. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2726004. Maximum sequence length: 2049, sample length: 3235 [default0]:Skipping sample id=2735001. Maximum sequence length: 2049, sample length: 4413 [default0]:Skipping sample id=2725459. Maximum sequence length: 2049, sample length: 4191 [default0]:Skipping sample id=2733041. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2734391. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2727267. Maximum sequence length: 2049, sample length: 3688 [default0]:Skipping sample id=2481808. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2735876. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2755120. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2734111. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2746979. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2719939. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2497937. Maximum sequence length: 2049, sample length: 4274 [default0]:Skipping sample id=2716347. Maximum sequence length: 2049, sample length: 2953 [default0]:Skipping sample id=2750382. Maximum sequence length: 2049, sample length: 2935 [default0]:Skipping sample id=2737392. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2737817. Maximum sequence length: 2049, sample length: 2956 [default0]:Skipping sample id=2735779. Maximum sequence length: 2049, sample length: 3629 [default0]:Skipping sample id=2735926. Maximum sequence length: 2049, sample length: 2958 [default0]:Skipping sample id=2739159. Maximum sequence length: 2049, sample length: 2597 [default0]:Skipping sample id=2486807. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2753964. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2486741. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2732291. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2753346. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2745810. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2719509. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2755420. Maximum sequence length: 2049, sample length: 3044 [default0]:Skipping sample id=2741087. Maximum sequence length: 2049, sample length: 3419 [default0]:Skipping sample id=2756188. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2731712. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2724215. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2756153. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2724251. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2754373. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2737048. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2731058. Maximum sequence length: 2049, sample length: 3311 [default0]:Skipping sample id=2749403. Maximum sequence length: 2049, sample length: 4131 [default0]:Skipping sample id=2713237. Maximum sequence length: 2049, sample length: 2573 [default0]:Skipping sample id=2733849. Maximum sequence length: 2049, sample length: 3971 [default0]:Skipping sample id=2753358. Maximum sequence length: 2049, sample length: 3134 [default0]:Skipping sample id=2736407. Maximum sequence length: 2049, sample length: 3678 [default0]:Skipping sample id=2755864. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2724037. Maximum sequence length: 2049, sample length: 5128 [default0]:Skipping sample id=2710981. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2720706. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2721204. Maximum sequence length: 2049, sample length: 3158 [default0]:Skipping sample id=2731612. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2740033. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2756021. Maximum sequence length: 2049, sample length: 3585 [default0]:Skipping sample id=2732471. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2485955. Maximum sequence length: 2049, sample length: 3323 [default0]:Skipping sample id=2750225. Maximum sequence length: 2049, sample length: 3018 [default0]:Skipping sample id=2752299. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2748209. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2719871. Maximum sequence length: 2049, sample length: 2712 [default0]:Skipping sample id=2481273. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2750670. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2733117. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2754019. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2713123. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2730279. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2465922. Maximum sequence length: 2049, sample length: 2521 [default0]:Skipping sample id=2715711. Maximum sequence length: 2049, sample length: 4211 [default0]:Skipping sample id=2755072. Maximum sequence length: 2049, sample length: 3604 [default0]:Skipping sample id=2739510. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2498056. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2747668. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2491967. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2741814. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2728685. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2733431. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2734188. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2751226. Maximum sequence length: 2049, sample length: 5494 [default0]:Skipping sample id=2487371. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2711634. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2748609. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2715992. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2743412. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2719896. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2726993. Maximum sequence length: 2049, sample length: 3320 [default0]:Skipping sample id=2756955. Maximum sequence length: 2049, sample length: 2956 [default0]:Skipping sample id=2723052. Maximum sequence length: 2049, sample length: 4549 [default0]:Skipping sample id=2735787. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2490382. Maximum sequence length: 2049, sample length: 2910 [default0]:Skipping sample id=2712326. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2721611. Maximum sequence length: 2049, sample length: 3875 [default0]:Skipping sample id=2750007. Maximum sequence length: 2049, sample length: 4474 [default0]:Skipping sample id=2747359. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2490095. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2719115. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2727958. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2727286. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2747170. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2756247. Maximum sequence length: 2049, sample length: 3305 [default0]:Skipping sample id=2729012. Maximum sequence length: 2049, sample length: 3895 [default0]:Skipping sample id=2726114. Maximum sequence length: 2049, sample length: 3441 [default0]:Skipping sample id=2491972. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2756679. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2741840. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2753191. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2712029. Maximum sequence length: 2049, sample length: 3405 [default0]:Skipping sample id=2727272. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2732558. Maximum sequence length: 2049, sample length: 3836 [default0]:Skipping sample id=2492966. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2724015. Maximum sequence length: 2049, sample length: 5590 [default0]:Skipping sample id=2747098. Maximum sequence length: 2049, sample length: 4354 [default0]:Skipping sample id=2733343. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2755707. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2728436. Maximum sequence length: 2049, sample length: 3932 [default0]:Skipping sample id=2711309. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2720321. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2735106. Maximum sequence length: 2049, sample length: 3440 [default0]:Skipping sample id=2730588. Maximum sequence length: 2049, sample length: 4174 [default0]:Skipping sample id=2751719. Maximum sequence length: 2049, sample length: 2953 [default0]:Skipping sample id=2727165. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2478149. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2723650. Maximum sequence length: 2049, sample length: 3472 [default0]:Skipping sample id=2496213. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2491240. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2741326. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2721493. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2728401. Maximum sequence length: 2049, sample length: 3398 [default0]:Skipping sample id=2751781. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2721657. Maximum sequence length: 2049, sample length: 3293 [default0]:Skipping sample id=2725498. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2713679. Maximum sequence length: 2049, sample length: 3146 [default0]:Skipping sample id=2477467. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2736496. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2713849. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2715179. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2719823. Maximum sequence length: 2049, sample length: 3303 [default0]:Skipping sample id=2724072. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2713739. Maximum sequence length: 2049, sample length: 4127 [default0]:Skipping sample id=2725476. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2734475. Maximum sequence length: 2049, sample length: 3211 [default0]:Skipping sample id=2730442. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2467805. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2482884. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2743323. Maximum sequence length: 2049, sample length: 4599 [default0]:Skipping sample id=2717280. Maximum sequence length: 2049, sample length: 5070 [default0]:Skipping sample id=2742943. Maximum sequence length: 2049, sample length: 3345 [default0]:Skipping sample id=2712750. Maximum sequence length: 2049, sample length: 3173 [default0]:Skipping sample id=2749532. Maximum sequence length: 2049, sample length: 4382 [default0]:Skipping sample id=2740245. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2737263. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2731236. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2724865. Maximum sequence length: 2049, sample length: 4992 [default0]:Skipping sample id=2732630. Maximum sequence length: 2049, sample length: 5456 [default0]:Skipping sample id=2720152. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2719191. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2722488. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2489583. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2722589. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2756910. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2736729. Maximum sequence length: 2049, sample length: 3964 [default0]:Skipping sample id=2496464. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2750980. Maximum sequence length: 2049, sample length: 7562 [default0]:Skipping sample id=2731491. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2733687. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2483191. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2716033. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2712694. Maximum sequence length: 2049, sample length: 4252 [default0]:Skipping sample id=2753754. Maximum sequence length: 2049, sample length: 3246 [default0]:Skipping sample id=2734747. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2498154. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2735406. Maximum sequence length: 2049, sample length: 3083 [default0]:Skipping sample id=2751733. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2737703. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2720092. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2735767. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2742417. Maximum sequence length: 2049, sample length: 3969 [default0]:Skipping sample id=2720019. Maximum sequence length: 2049, sample length: 3093 [default0]:Skipping sample id=2749836. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2751078. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2723077. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2720308. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2753247. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2711873. Maximum sequence length: 2049, sample length: 3020 [default0]:Skipping sample id=2716419. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2715207. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2720727. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2726643. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2498164. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2750774. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2718797. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2483360. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2732837. Maximum sequence length: 2049, sample length: 6482 [default0]:Skipping sample id=2480475. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2715570. Maximum sequence length: 2049, sample length: 4024 [default0]:Skipping sample id=2734352. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2727739. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2738500. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2738388. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2751022. Maximum sequence length: 2049, sample length: 5437 [default0]:Skipping sample id=2734189. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2726308. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2725401. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2714398. Maximum sequence length: 2049, sample length: 3957 [default0]:Skipping sample id=2748229. Maximum sequence length: 2049, sample length: 3136 [default0]:Skipping sample id=2495046. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2727415. Maximum sequence length: 2049, sample length: 3161 [default0]:Skipping sample id=2715178. Maximum sequence length: 2049, sample length: 5990 [default0]:Skipping sample id=2751164. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2720635. Maximum sequence length: 2049, sample length: 3162 [default0]:Skipping sample id=2716530. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2471117. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2714972. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2743423. Maximum sequence length: 2049, sample length: 2656 [default0]:Skipping sample id=2751118. Maximum sequence length: 2049, sample length: 3307 [default0]:Skipping sample id=2723842. Maximum sequence length: 2049, sample length: 3454 [default0]:Skipping sample id=2738212. Maximum sequence length: 2049, sample length: 3325 [default0]:Skipping sample id=2724645. Maximum sequence length: 2049, sample length: 2910 [default0]:Skipping sample id=2724222. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2724107. Maximum sequence length: 2049, sample length: 2740 [default0]:Skipping sample id=2740861. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2726036. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2728419. Maximum sequence length: 2049, sample length: 3521 [default0]:Skipping sample id=2467707. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2751080. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2711572. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2720240. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2731802. Maximum sequence length: 2049, sample length: 4504 [default0]:Skipping sample id=2736397. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2753306. Maximum sequence length: 2049, sample length: 2421 [default0]:Skipping sample id=2726759. Maximum sequence length: 2049, sample length: 4893 [default0]:Skipping sample id=2711018. Maximum sequence length: 2049, sample length: 5013 [default0]:Skipping sample id=2725837. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2729060. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2755012. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2469804. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2498656. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2717218. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2754406. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2484998. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2734757. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2743711. Maximum sequence length: 2049, sample length: 3143 [default0]:Skipping sample id=2719172. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2730470. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2470675. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2745282. Maximum sequence length: 2049, sample length: 4289 [default0]:Skipping sample id=2730258. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2731253. Maximum sequence length: 2049, sample length: 3202 [default0]:Skipping sample id=2721623. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2736759. Maximum sequence length: 2049, sample length: 3295 [default0]:Skipping sample id=2497370. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2743496. Maximum sequence length: 2049, sample length: 3241 [default0]:Skipping sample id=2720846. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2722608. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2735668. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2720922. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2755517. Maximum sequence length: 2049, sample length: 4295 [default0]:Skipping sample id=2727357. Maximum sequence length: 2049, sample length: 3989 [default0]:Skipping sample id=2753646. Maximum sequence length: 2049, sample length: 2846 [default0]:Skipping sample id=2754507. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2752228. Maximum sequence length: 2049, sample length: 2568 [default0]:Skipping sample id=2711066. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2731966. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2467525. Maximum sequence length: 2049, sample length: 2671 [default0]:Skipping sample id=2746412. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2489776. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2722774. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2717624. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2751471. Maximum sequence length: 2049, sample length: 4049 [default0]:Skipping sample id=2482957. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2720421. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2712244. Maximum sequence length: 2049, sample length: 2998 [default0]:Skipping sample id=2753538. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2730663. Maximum sequence length: 2049, sample length: 3021 [default0]:Skipping sample id=2721661. Maximum sequence length: 2049, sample length: 4437 [default0]:Skipping sample id=2723015. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2755295. Maximum sequence length: 2049, sample length: 6560 [default0]:Skipping sample id=2714578. Maximum sequence length: 2049, sample length: 3778 [default0]:Skipping sample id=2719606. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2718907. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2725083. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2715681. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2497675. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2714884. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2735868. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2716944. Maximum sequence length: 2049, sample length: 2601 [default0]:Skipping sample id=2497580. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2728569. Maximum sequence length: 2049, sample length: 4020 [default0]:Skipping sample id=2745050. Maximum sequence length: 2049, sample length: 3075 [default0]:Skipping sample id=2736159. Maximum sequence length: 2049, sample length: 3416 [default0]:Skipping sample id=2469451. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2731223. Maximum sequence length: 2049, sample length: 3646 [default0]:Skipping sample id=2755780. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2722489. Maximum sequence length: 2049, sample length: 5524 [default0]:Skipping sample id=2714199. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2486067. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2741338. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2745405. Maximum sequence length: 2049, sample length: 4038 [default0]:Skipping sample id=2718503. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2749832. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2728225. Maximum sequence length: 2049, sample length: 3474 [default0]:Skipping sample id=2740081. Maximum sequence length: 2049, sample length: 4553 [default0]:Skipping sample id=2752163. Maximum sequence length: 2049, sample length: 4676 [default0]:Skipping sample id=2719233. Maximum sequence length: 2049, sample length: 4572 [default0]:Skipping sample id=2496681. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2740964. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2752279. Maximum sequence length: 2049, sample length: 4527 [default0]:Skipping sample id=2499398. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2735488. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2729325. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2494940. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2727632. Maximum sequence length: 2049, sample length: 3466 [default0]:Skipping sample id=2745744. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2717548. Maximum sequence length: 2049, sample length: 3643 [default0]:Skipping sample id=2482408. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2733259. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2722701. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2720916. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2719580. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2741141. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2467249. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2756925. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2743464. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2732263. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2717321. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2722451. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2712791. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2495497. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2729999. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2749820. Maximum sequence length: 2049, sample length: 3899 [default0]:Skipping sample id=2737568. Maximum sequence length: 2049, sample length: 3947 [default0]:Skipping sample id=2721815. Maximum sequence length: 2049, sample length: 4119 [default0]:Skipping sample id=2747657. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2728305. Maximum sequence length: 2049, sample length: 4924 [default0]:Skipping sample id=2742209. Maximum sequence length: 2049, sample length: 3675 [default0]:Skipping sample id=2487897. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2752689. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2493717. Maximum sequence length: 2049, sample length: 3097 [default0]:Skipping sample id=2736155. Maximum sequence length: 2049, sample length: 2990 [default0]:Skipping sample id=2744344. Maximum sequence length: 2049, sample length: 3734 [default0]:Skipping sample id=2752765. Maximum sequence length: 2049, sample length: 4313 [default0]:Skipping sample id=2723600. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2754480. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2720741. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2467015. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2745763. Maximum sequence length: 2049, sample length: 3580 [default0]:Skipping sample id=2495216. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2716687. Maximum sequence length: 2049, sample length: 4718 [default0]:Skipping sample id=2755620. Maximum sequence length: 2049, sample length: 3106 [default0]:Skipping sample id=2743692. Maximum sequence length: 2049, sample length: 4771 [default0]:Skipping sample id=2719517. Maximum sequence length: 2049, sample length: 3151 [default0]:Skipping sample id=2743236. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2741073. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2746128. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2745819. Maximum sequence length: 2049, sample length: 4369 [default0]:Skipping sample id=2751885. Maximum sequence length: 2049, sample length: 2640 [default0]:Skipping sample id=2754789. Maximum sequence length: 2049, sample length: 5448 [default0]:Skipping sample id=2724421. Maximum sequence length: 2049, sample length: 2908 [default0]:Skipping sample id=2734051. Maximum sequence length: 2049, sample length: 4158 [default0]:Skipping sample id=2735734. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2487486. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2720763. Maximum sequence length: 2049, sample length: 4233 [default0]:Skipping sample id=2750885. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2725757. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2716565. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2741110. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2733602. Maximum sequence length: 2049, sample length: 3671 [default0]:Skipping sample id=2484813. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2732811. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2747537. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2755356. Maximum sequence length: 2049, sample length: 4108 [default0]:Skipping sample id=2748146. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2744034. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2749045. Maximum sequence length: 2049, sample length: 3301 [default0]:Skipping sample id=2492236. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2753501. Maximum sequence length: 2049, sample length: 4625 [default0]:Skipping sample id=2479719. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2729511. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2496306. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2731919. Maximum sequence length: 2049, sample length: 4420 [default0]:Skipping sample id=2750324. Maximum sequence length: 2049, sample length: 3370 [default0]:Skipping sample id=2753018. Maximum sequence length: 2049, sample length: 4941 [default0]:Skipping sample id=2739349. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2478356. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2728544. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2724501. Maximum sequence length: 2049, sample length: 5214 [default0]:Skipping sample id=2746700. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2720637. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2741513. Maximum sequence length: 2049, sample length: 2973 [default0]:Skipping sample id=2715282. Maximum sequence length: 2049, sample length: 2952 [default0]:Skipping sample id=2737728. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2741129. Maximum sequence length: 2049, sample length: 5264 [default0]:Skipping sample id=2496333. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2727795. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2746469. Maximum sequence length: 2049, sample length: 3699 [default0]:Skipping sample id=2744648. Maximum sequence length: 2049, sample length: 4443 [default0]:Skipping sample id=2733200. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2750162. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2483113. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2713802. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2719910. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2712395. Maximum sequence length: 2049, sample length: 3284 [default0]:Skipping sample id=2746886. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2734318. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2729366. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2753294. Maximum sequence length: 2049, sample length: 4158 [default0]:Skipping sample id=2731516. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2711214. Maximum sequence length: 2049, sample length: 2937 [default0]:Skipping sample id=2736059. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2725588. Maximum sequence length: 2049, sample length: 2592 [default0]:Skipping sample id=2747781. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2744186. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2749597. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2747950. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2722098. Maximum sequence length: 2049, sample length: 3013 [default0]:Skipping sample id=2725014. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2718409. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2746471. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2724264. Maximum sequence length: 2049, sample length: 4155 [default0]:Skipping sample id=2743312. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2727499. Maximum sequence length: 2049, sample length: 5133 [default0]:Skipping sample id=2736998. Maximum sequence length: 2049, sample length: 3792 [default0]:Skipping sample id=2715121. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2740275. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2732545. Maximum sequence length: 2049, sample length: 3395 [default0]:Skipping sample id=2740820. Maximum sequence length: 2049, sample length: 2865 [default0]:Skipping sample id=2717678. Maximum sequence length: 2049, sample length: 4764 [default0]:Skipping sample id=2713783. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2734022. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2745661. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2743891. Maximum sequence length: 2049, sample length: 6434 [default0]:Skipping sample id=2751322. Maximum sequence length: 2049, sample length: 2758 [default0]:Skipping sample id=2747150. Maximum sequence length: 2049, sample length: 4529 [default0]:Skipping sample id=2737563. Maximum sequence length: 2049, sample length: 3611 [default0]:Skipping sample id=2735110. Maximum sequence length: 2049, sample length: 3109 [default0]:Skipping sample id=2746851. Maximum sequence length: 2049, sample length: 2567 [default0]:Skipping sample id=2713744. Maximum sequence length: 2049, sample length: 3325 [default0]:Skipping sample id=2481020. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2711045. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2739178. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2729599. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2730096. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2483877. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2727241. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2743738. Maximum sequence length: 2049, sample length: 3604 [default0]:Skipping sample id=2727434. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2466873. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2724933. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2743766. Maximum sequence length: 2049, sample length: 5147 [default0]:Skipping sample id=2748258. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2749853. Maximum sequence length: 2049, sample length: 2553 [default0]:Skipping sample id=2728242. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2734982. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2729188. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2756658. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2711631. Maximum sequence length: 2049, sample length: 4188 [default0]:Skipping sample id=2482156. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2756134. Maximum sequence length: 2049, sample length: 4129 [default0]:Skipping sample id=2719392. Maximum sequence length: 2049, sample length: 6639 [default0]:Skipping sample id=2467206. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2484671. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2723531. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2739229. Maximum sequence length: 2049, sample length: 3335 [default0]:Skipping sample id=2714524. Maximum sequence length: 2049, sample length: 3560 [default0]:Skipping sample id=2719490. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2750134. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2738505. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2489136. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2724775. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2716459. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2718290. Maximum sequence length: 2049, sample length: 3678 [default0]:Skipping sample id=2715915. Maximum sequence length: 2049, sample length: 4073 [default0]:Skipping sample id=2493162. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2713669. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2712723. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2751233. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2727782. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2724892. Maximum sequence length: 2049, sample length: 3108 [default0]:Skipping sample id=2756618. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2717112. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2466234. Maximum sequence length: 2049, sample length: 2822 [default0]:Skipping sample id=2737890. Maximum sequence length: 2049, sample length: 3387 [default0]:Skipping sample id=2737635. Maximum sequence length: 2049, sample length: 3127 [default0]:Skipping sample id=2731416. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2732454. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2494567. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2716747. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2741847. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2742300. Maximum sequence length: 2049, sample length: 4501 [default0]:Skipping sample id=2727127. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2728302. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2716128. Maximum sequence length: 2049, sample length: 4371 [default0]:Skipping sample id=2738472. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2746783. Maximum sequence length: 2049, sample length: 4753 [default0]:Skipping sample id=2470503. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2726822. Maximum sequence length: 2049, sample length: 3814 [default0]:Skipping sample id=2728374. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2735963. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2717703. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2729606. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2736485. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2737337. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2739182. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2719058. Maximum sequence length: 2049, sample length: 3175 [default0]:Skipping sample id=2755718. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2487690. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2470526. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2721840. Maximum sequence length: 2049, sample length: 2749 [default0]:Skipping sample id=2732002. Maximum sequence length: 2049, sample length: 5177 [default0]:Skipping sample id=2721101. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2750787. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2482634. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2722093. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2724737. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2714782. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2735792. Maximum sequence length: 2049, sample length: 3021 [default0]:Skipping sample id=2747112. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2716548. Maximum sequence length: 2049, sample length: 6684 [default0]:Skipping sample id=2711163. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2739363. Maximum sequence length: 2049, sample length: 4494 [default0]:Skipping sample id=2747106. Maximum sequence length: 2049, sample length: 3253 [default0]:Skipping sample id=2723235. Maximum sequence length: 2049, sample length: 3286 [default0]:Skipping sample id=2740662. Maximum sequence length: 2049, sample length: 2625 [default0]:Skipping sample id=2734426. Maximum sequence length: 2049, sample length: 4081 [default0]:Skipping sample id=2727692. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2465773. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2736895. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2728393. Maximum sequence length: 2049, sample length: 5032 [default0]:Skipping sample id=2483440. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2719310. Maximum sequence length: 2049, sample length: 5974 [default0]:Skipping sample id=2753827. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2484932. Maximum sequence length: 2049, sample length: 3847 [default0]:Skipping sample id=2740583. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2481344. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2725328. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2481030. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2723773. Maximum sequence length: 2049, sample length: 5265 [default0]:Skipping sample id=2734789. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2733130. Maximum sequence length: 2049, sample length: 2998 [default0]:Skipping sample id=2738293. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2755293. Maximum sequence length: 2049, sample length: 2978 [default0]:Skipping sample id=2723747. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2755419. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2499135. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2728996. Maximum sequence length: 2049, sample length: 3345 [default0]:Skipping sample id=2755977. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2721677. Maximum sequence length: 2049, sample length: 6691 [default0]:Skipping sample id=2489154. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2495359. Maximum sequence length: 2049, sample length: 3272 [default0]:Skipping sample id=2711804. Maximum sequence length: 2049, sample length: 4166 [default0]:Skipping sample id=2722090. Maximum sequence length: 2049, sample length: 5651 [default0]:Skipping sample id=2750968. Maximum sequence length: 2049, sample length: 3722 [default0]:Skipping sample id=2730087. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2743963. Maximum sequence length: 2049, sample length: 6212 [default0]:Skipping sample id=2737989. Maximum sequence length: 2049, sample length: 4687 [default0]:Skipping sample id=2721031. Maximum sequence length: 2049, sample length: 3583 [default0]:Skipping sample id=2748815. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2752797. Maximum sequence length: 2049, sample length: 2593 [default0]:Skipping sample id=2739517. Maximum sequence length: 2049, sample length: 3357 [default0]:Skipping sample id=2735552. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2735873. Maximum sequence length: 2049, sample length: 3430 [default0]:Skipping sample id=2753695. Maximum sequence length: 2049, sample length: 3702 [default0]:Skipping sample id=2722501. Maximum sequence length: 2049, sample length: 2555 [default0]:Skipping sample id=2723173. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2748622. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2740269. Maximum sequence length: 2049, sample length: 4197 [default0]:Skipping sample id=2752779. Maximum sequence length: 2049, sample length: 2669 [default0]:Skipping sample id=2717911. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2725718. Maximum sequence length: 2049, sample length: 4261 [default0]:Skipping sample id=2725975. Maximum sequence length: 2049, sample length: 3614 [default0]:Skipping sample id=2466248. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2735224. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2734545. Maximum sequence length: 2049, sample length: 3294 [default0]:Skipping sample id=2722952. Maximum sequence length: 2049, sample length: 4259 [default0]:Skipping sample id=2726706. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2730828. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2721041. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2750383. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2495475. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2754564. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2746038. Maximum sequence length: 2049, sample length: 3051 [default0]:Skipping sample id=2750873. Maximum sequence length: 2049, sample length: 2616 [default0]:Skipping sample id=2728717. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2734698. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2746014. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2741342. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2742256. Maximum sequence length: 2049, sample length: 3298 [default0]:Skipping sample id=2490840. Maximum sequence length: 2049, sample length: 2649 [default0]:Skipping sample id=2738677. Maximum sequence length: 2049, sample length: 3431 [default0]:Skipping sample id=2731615. Maximum sequence length: 2049, sample length: 3029 [default0]:Skipping sample id=2741640. Maximum sequence length: 2049, sample length: 3105 [default0]:Skipping sample id=2729566. Maximum sequence length: 2049, sample length: 3411 [default0]:Skipping sample id=2722980. Maximum sequence length: 2049, sample length: 3330 [default0]:Skipping sample id=2729567. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2754945. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2715597. Maximum sequence length: 2049, sample length: 5257 [default0]:Skipping sample id=2729500. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2741749. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2741062. Maximum sequence length: 2049, sample length: 4549 [default0]:Skipping sample id=2752263. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2724429. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2724528. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2750002. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2722463. Maximum sequence length: 2049, sample length: 4756 [default0]:Skipping sample id=2725471. Maximum sequence length: 2049, sample length: 3840 [default0]:Skipping sample id=2743501. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2710998. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2725167. Maximum sequence length: 2049, sample length: 3369 [default0]:Skipping sample id=2743733. Maximum sequence length: 2049, sample length: 3596 [default0]:Skipping sample id=2711343. Maximum sequence length: 2049, sample length: 3845 [default0]:Skipping sample id=2717289. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2734097. Maximum sequence length: 2049, sample length: 4232 [default0]:Skipping sample id=2751981. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2745166. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2729154. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2728293. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2728912. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2484750. Maximum sequence length: 2049, sample length: 3451 [default0]:Skipping sample id=2478543. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2746359. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2477789. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2486287. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2479648. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2715389. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2718644. Maximum sequence length: 2049, sample length: 5258 [default0]:Skipping sample id=2719692. Maximum sequence length: 2049, sample length: 3087 [default0]:Skipping sample id=2711879. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2726149. Maximum sequence length: 2049, sample length: 3695 [default0]:Skipping sample id=2752911. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2733743. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2479637. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2730126. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2732702. Maximum sequence length: 2049, sample length: 7095 [default0]:Skipping sample id=2742848. Maximum sequence length: 2049, sample length: 3610 [default0]:Skipping sample id=2711492. Maximum sequence length: 2049, sample length: 4046 [default0]:Skipping sample id=2496839. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2729056. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2738390. Maximum sequence length: 2049, sample length: 3197 [default0]:Skipping sample id=2724370. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2755557. Maximum sequence length: 2049, sample length: 3768 [default0]:Skipping sample id=2726126. Maximum sequence length: 2049, sample length: 3105 [default0]:Skipping sample id=2491689. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2730054. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2494782. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2727176. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2479128. Maximum sequence length: 2049, sample length: 3446 [default0]:Skipping sample id=2711406. Maximum sequence length: 2049, sample length: 4052 [default0]:Skipping sample id=2752435. Maximum sequence length: 2049, sample length: 4369 [default0]:Skipping sample id=2494446. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2717555. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2741651. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2735504. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2466316. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2721312. Maximum sequence length: 2049, sample length: 3180 [default0]:Skipping sample id=2725849. Maximum sequence length: 2049, sample length: 4337 [default0]:Skipping sample id=2733876. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2721373. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2714091. Maximum sequence length: 2049, sample length: 5313 [default0]:Skipping sample id=2748126. Maximum sequence length: 2049, sample length: 5270 [default0]:Skipping sample id=2740702. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2729287. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2755153. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2723931. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2717124. Maximum sequence length: 2049, sample length: 3674 [default0]:Skipping sample id=2747712. Maximum sequence length: 2049, sample length: 3421 [default0]:Skipping sample id=2712624. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2466224. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2743705. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2735133. Maximum sequence length: 2049, sample length: 6604 [default0]:Skipping sample id=2736935. Maximum sequence length: 2049, sample length: 2676 [default0]:Skipping sample id=2720348. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2727956. Maximum sequence length: 2049, sample length: 3591 [default0]:Skipping sample id=2754213. Maximum sequence length: 2049, sample length: 4256 [default0]:Skipping sample id=2723772. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2477497. Maximum sequence length: 2049, sample length: 3244 [default0]:Skipping sample id=2712587. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2468659. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2726369. Maximum sequence length: 2049, sample length: 3393 [default0]:Skipping sample id=2739302. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2468772. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2737532. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2737477. Maximum sequence length: 2049, sample length: 5456 [default0]:Skipping sample id=2713964. Maximum sequence length: 2049, sample length: 2589 [default0]:Skipping sample id=2711702. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2721052. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2753687. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2737888. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2736944. Maximum sequence length: 2049, sample length: 2878 [default0]:Skipping sample id=2752466. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2721694. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2734937. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2750452. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2727275. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2718643. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2726350. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2735736. Maximum sequence length: 2049, sample length: 5954 [default0]:Skipping sample id=2724382. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2743539. Maximum sequence length: 2049, sample length: 3545 [default0]:Skipping sample id=2724580. Maximum sequence length: 2049, sample length: 3084 [default0]:Skipping sample id=2713572. Maximum sequence length: 2049, sample length: 3407 [default0]:Skipping sample id=2735330. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2752816. Maximum sequence length: 2049, sample length: 2672 [default0]:Skipping sample id=2488541. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2732377. Maximum sequence length: 2049, sample length: 3150 [default0]:Skipping sample id=2716364. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2747427. Maximum sequence length: 2049, sample length: 2728 [default0]:Skipping sample id=2721311. Maximum sequence length: 2049, sample length: 3785 [default0]:Skipping sample id=2467867. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2733376. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2752807. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2494317. Maximum sequence length: 2049, sample length: 3517 [default0]:Skipping sample id=2743632. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2747425. Maximum sequence length: 2049, sample length: 5036 [default0]:Skipping sample id=2742654. Maximum sequence length: 2049, sample length: 3505 [default0]:Skipping sample id=2496209. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2746344. Maximum sequence length: 2049, sample length: 3809 [default0]:Skipping sample id=2721472. Maximum sequence length: 2049, sample length: 2944 [default0]:Skipping sample id=2747848. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2730456. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2716231. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2748653. Maximum sequence length: 2049, sample length: 3061 [default0]:Skipping sample id=2744432. Maximum sequence length: 2049, sample length: 3384 [default0]:Skipping sample id=2727553. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2727190. Maximum sequence length: 2049, sample length: 3171 [default0]:Skipping sample id=2741511. Maximum sequence length: 2049, sample length: 3053 [default0]:Skipping sample id=2731559. Maximum sequence length: 2049, sample length: 4235 [default0]:Skipping sample id=2735908. Maximum sequence length: 2049, sample length: 3457 [default0]:Skipping sample id=2712865. Maximum sequence length: 2049, sample length: 3234 [default0]:Skipping sample id=2726804. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2739726. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2731982. Maximum sequence length: 2049, sample length: 5779 [default0]:Skipping sample id=2727011. Maximum sequence length: 2049, sample length: 3670 [default0]:Skipping sample id=2750227. Maximum sequence length: 2049, sample length: 4110 [default0]:Skipping sample id=2723097. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2743068. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2485226. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2729454. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2493882. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2755139. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2732919. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2733140. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2717507. Maximum sequence length: 2049, sample length: 4485 [default0]:Skipping sample id=2745516. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2466891. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2747933. Maximum sequence length: 2049, sample length: 14228 [default0]:Skipping sample id=2753954. Maximum sequence length: 2049, sample length: 4034 [default0]:Skipping sample id=2479138. Maximum sequence length: 2049, sample length: 2914 [default0]:Skipping sample id=2746921. Maximum sequence length: 2049, sample length: 4847 [default0]:Skipping sample id=2730051. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2722145. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2741659. Maximum sequence length: 2049, sample length: 3875 [default0]:Skipping sample id=2750796. Maximum sequence length: 2049, sample length: 6452 [default0]:Skipping sample id=2496765. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2714588. Maximum sequence length: 2049, sample length: 3342 [default0]:Skipping sample id=2736803. Maximum sequence length: 2049, sample length: 4697 [default0]:Skipping sample id=2732140. Maximum sequence length: 2049, sample length: 3670 [default0]:Skipping sample id=2726860. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2740681. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2737281. Maximum sequence length: 2049, sample length: 3374 [default0]:Skipping sample id=2723485. Maximum sequence length: 2049, sample length: 2750 [default0]:Skipping sample id=2727031. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2722046. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2719819. Maximum sequence length: 2049, sample length: 3868 [default0]:Skipping sample id=2719603. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2478932. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2734637. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2755639. Maximum sequence length: 2049, sample length: 4738 [default0]:Skipping sample id=2754556. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2743112. Maximum sequence length: 2049, sample length: 5487 [default0]:Skipping sample id=2493766. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2747762. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2739019. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2725897. Maximum sequence length: 2049, sample length: 3007 [default0]:Skipping sample id=2750377. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2479081. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2725912. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2752956. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2742786. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2747814. Maximum sequence length: 2049, sample length: 4614 [default0]:Skipping sample id=2729213. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2748946. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2712225. Maximum sequence length: 2049, sample length: 2953 [default0]:Skipping sample id=2481706. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2729011. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2741832. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2741392. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2727544. Maximum sequence length: 2049, sample length: 3865 [default0]:Skipping sample id=2746925. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2733959. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2737404. Maximum sequence length: 2049, sample length: 2621 [default0]:Skipping sample id=2740532. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2740349. Maximum sequence length: 2049, sample length: 3738 [default0]:Skipping sample id=2744326. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2716029. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2731810. Maximum sequence length: 2049, sample length: 2586 [default0]:Skipping sample id=2746337. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2735542. Maximum sequence length: 2049, sample length: 3578 [default0]:Skipping sample id=2720178. Maximum sequence length: 2049, sample length: 3835 [default0]:Skipping sample id=2735584. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2755922. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2717630. Maximum sequence length: 2049, sample length: 2904 [default0]:Skipping sample id=2755521. Maximum sequence length: 2049, sample length: 7512 [default0]:Skipping sample id=2754896. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2739239. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2465972. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2717128. Maximum sequence length: 2049, sample length: 3571 [default0]:Skipping sample id=2495409. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2466649. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2718811. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2737997. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2721806. Maximum sequence length: 2049, sample length: 2521 [default0]:Skipping sample id=2745315. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2733351. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2725581. Maximum sequence length: 2049, sample length: 5682 [default0]:Skipping sample id=2750439. Maximum sequence length: 2049, sample length: 3314 [default0]:Skipping sample id=2499052. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2734869. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2722267. Maximum sequence length: 2049, sample length: 2740 [default0]:Skipping sample id=2746047. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2740955. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2750539. Maximum sequence length: 2049, sample length: 3680 [default0]:Skipping sample id=2735196. Maximum sequence length: 2049, sample length: 3995 [default0]:Skipping sample id=2494396. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2712276. Maximum sequence length: 2049, sample length: 2961 [default0]:Skipping sample id=2730177. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2466102. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2741792. Maximum sequence length: 2049, sample length: 4839 [default0]:Skipping sample id=2752738. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2751290. Maximum sequence length: 2049, sample length: 3240 [default0]:Skipping sample id=2489369. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2718771. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2719192. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2731607. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2752539. Maximum sequence length: 2049, sample length: 4603 [default0]:Skipping sample id=2732014. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2712219. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2730144. Maximum sequence length: 2049, sample length: 3819 [default0]:Skipping sample id=2747927. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2720576. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2726628. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2740342. Maximum sequence length: 2049, sample length: 4357 [default0]:Skipping sample id=2753263. Maximum sequence length: 2049, sample length: 5331 [default0]:Skipping sample id=2741040. Maximum sequence length: 2049, sample length: 5337 [default0]:Skipping sample id=2740890. Maximum sequence length: 2049, sample length: 3611 [default0]:Skipping sample id=2730860. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2754333. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2754449. Maximum sequence length: 2049, sample length: 2861 [default0]:Skipping sample id=2731735. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2717968. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2737826. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2498784. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2470237. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2725128. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2752886. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2736871. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2751298. Maximum sequence length: 2049, sample length: 5346 [default0]:Skipping sample id=2731556. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2733239. Maximum sequence length: 2049, sample length: 2765 [default0]:Skipping sample id=2715489. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2734578. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2468533. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2739118. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2717926. Maximum sequence length: 2049, sample length: 4907 [default0]:Skipping sample id=2745606. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2478258. Maximum sequence length: 2049, sample length: 3347 [default0]:Skipping sample id=2741557. Maximum sequence length: 2049, sample length: 2915 [default0]:Skipping sample id=2740066. Maximum sequence length: 2049, sample length: 3361 [default0]:Skipping sample id=2724846. Maximum sequence length: 2049, sample length: 3349 [default0]:Skipping sample id=2731664. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2483417. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2490782. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2719523. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2753230. Maximum sequence length: 2049, sample length: 2862 [default0]:Skipping sample id=2735389. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2744297. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2718713. Maximum sequence length: 2049, sample length: 3447 [default0]:Skipping sample id=2716787. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2722440. Maximum sequence length: 2049, sample length: 4097 [default0]:Skipping sample id=2746544. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2730163. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2752287. Maximum sequence length: 2049, sample length: 4865 [default0]:Skipping sample id=2753254. Maximum sequence length: 2049, sample length: 3114 [default0]:Skipping sample id=2745235. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2715906. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2744678. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2756789. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2716443. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2733570. Maximum sequence length: 2049, sample length: 3288 [default0]:Skipping sample id=2755847. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2732486. Maximum sequence length: 2049, sample length: 3988 [default0]:Skipping sample id=2739216. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2738886. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2499088. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2713537. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2713029. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2489844. Maximum sequence length: 2049, sample length: 3611 [default0]:Skipping sample id=2493615. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2715143. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2733311. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2735847. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2736637. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2754409. Maximum sequence length: 2049, sample length: 3915 [default0]:Skipping sample id=2745156. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2735826. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2713807. Maximum sequence length: 2049, sample length: 4294 [default0]:Skipping sample id=2488211. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2715219. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2751807. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2756402. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2724782. Maximum sequence length: 2049, sample length: 3567 [default0]:Skipping sample id=2489005. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2749934. Maximum sequence length: 2049, sample length: 2490 [default0]:Skipping sample id=2743200. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2745313. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2752385. Maximum sequence length: 2049, sample length: 3797 [default0]:Skipping sample id=2495591. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2746485. Maximum sequence length: 2049, sample length: 3938 [default0]:Skipping sample id=2750277. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2739874. Maximum sequence length: 2049, sample length: 3959 [default0]:Skipping sample id=2753276. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2755685. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2467399. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2731552. Maximum sequence length: 2049, sample length: 2963 [default0]:Skipping sample id=2718837. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2719890. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2466523. Maximum sequence length: 2049, sample length: 2833 [default0]:Skipping sample id=2714709. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2749402. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2750274. Maximum sequence length: 2049, sample length: 2839 [default0]:Skipping sample id=2734008. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2482201. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2720061. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2730917. Maximum sequence length: 2049, sample length: 3307 [default0]:Skipping sample id=2730726. Maximum sequence length: 2049, sample length: 3037 [default0]:Skipping sample id=2467518. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2468361. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2722341. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2742552. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2722405. Maximum sequence length: 2049, sample length: 3616 [default0]:Skipping sample id=2725949. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2750084. Maximum sequence length: 2049, sample length: 5866 [default0]:Skipping sample id=2730452. Maximum sequence length: 2049, sample length: 3959 [default0]:Skipping sample id=2492641. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2492736. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2729039. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2736038. Maximum sequence length: 2049, sample length: 4135 [default0]:Skipping sample id=2753413. Maximum sequence length: 2049, sample length: 5179 [default0]:Skipping sample id=2756311. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2481769. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2478029. Maximum sequence length: 2049, sample length: 4088 [default0]:Skipping sample id=2717726. Maximum sequence length: 2049, sample length: 3228 [default0]:Skipping sample id=2477691. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2752656. Maximum sequence length: 2049, sample length: 3436 [default0]:Skipping sample id=2712886. Maximum sequence length: 2049, sample length: 3746 [default0]:Skipping sample id=2489099. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2723179. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2747692. Maximum sequence length: 2049, sample length: 3503 [default0]:Skipping sample id=2747217. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2725020. Maximum sequence length: 2049, sample length: 2923 [default0]:Skipping sample id=2718809. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2753466. Maximum sequence length: 2049, sample length: 3236 [default0]:Skipping sample id=2735371. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2493203. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2751320. Maximum sequence length: 2049, sample length: 4558 [default0]:Skipping sample id=2741520. Maximum sequence length: 2049, sample length: 2594 [default0]:Skipping sample id=2751497. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2743828. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2720808. Maximum sequence length: 2049, sample length: 3056 [default0]:Skipping sample id=2751443. Maximum sequence length: 2049, sample length: 3898 [default0]:Skipping sample id=2754869. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2718239. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2750220. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2718604. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2716562. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2490533. Maximum sequence length: 2049, sample length: 2756 [default0]:Skipping sample id=2466601. Maximum sequence length: 2049, sample length: 3197 [default0]:Skipping sample id=2733537. Maximum sequence length: 2049, sample length: 6399 [default0]:Skipping sample id=2754892. Maximum sequence length: 2049, sample length: 5110 [default0]:Skipping sample id=2493543. Maximum sequence length: 2049, sample length: 3088 [default0]:Skipping sample id=2730813. Maximum sequence length: 2049, sample length: 4147 [default0]:Skipping sample id=2743182. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2712014. Maximum sequence length: 2049, sample length: 3447 [default0]:Skipping sample id=2478534. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2727379. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2739810. Maximum sequence length: 2049, sample length: 5166 [default0]:Skipping sample id=2745396. Maximum sequence length: 2049, sample length: 4398 [default0]:Skipping sample id=2747765. Maximum sequence length: 2049, sample length: 3525 [default0]:Skipping sample id=2737374. Maximum sequence length: 2049, sample length: 3425 [default0]:Skipping sample id=2740864. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2469429. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2737302. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2739473. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2471056. Maximum sequence length: 2049, sample length: 3095 [default0]:Skipping sample id=2470983. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2752948. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2739317. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2483747. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2717315. Maximum sequence length: 2049, sample length: 7105 [default0]:Skipping sample id=2754266. Maximum sequence length: 2049, sample length: 4795 [default0]:Skipping sample id=2493489. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2498828. Maximum sequence length: 2049, sample length: 2894 [default0]:Skipping sample id=2752630. Maximum sequence length: 2049, sample length: 3251 [default0]:Skipping sample id=2732854. Maximum sequence length: 2049, sample length: 5737 [default0]:Skipping sample id=2725461. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2495914. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2752952. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2714342. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2712148. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2483089. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2753000. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2478573. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2713607. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2722483. Maximum sequence length: 2049, sample length: 2960 [default0]:Skipping sample id=2714208. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2732314. Maximum sequence length: 2049, sample length: 2826 [default0]:Skipping sample id=2484219. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2734013. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2488803. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2494046. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2713422. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2729058. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2470958. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2752045. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2477004. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2720926. Maximum sequence length: 2049, sample length: 3717 [default0]:Skipping sample id=2746436. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2729552. Maximum sequence length: 2049, sample length: 5312 [default0]:Skipping sample id=2736405. Maximum sequence length: 2049, sample length: 3867 [default0]:Skipping sample id=2742065. Maximum sequence length: 2049, sample length: 3049 [default0]:Skipping sample id=2731241. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2746664. Maximum sequence length: 2049, sample length: 3366 [default0]:Skipping sample id=2719402. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2480656. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2717179. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2745844. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2484655. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2732921. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2478371. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2484969. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2737561. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2753594. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2723375. Maximum sequence length: 2049, sample length: 6550 [default0]:Skipping sample id=2734247. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2477989. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2726469. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2496870. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2720468. Maximum sequence length: 2049, sample length: 3047 [default0]:Skipping sample id=2723954. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2719294. Maximum sequence length: 2049, sample length: 5322 [default0]:Skipping sample id=2754002. Maximum sequence length: 2049, sample length: 4824 [default0]:Skipping sample id=2726680. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2711675. Maximum sequence length: 2049, sample length: 2642 [default0]:Skipping sample id=2723966. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2494301. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2755302. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2732800. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2733111. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2714865. Maximum sequence length: 2049, sample length: 3880 [default0]:Skipping sample id=2754301. Maximum sequence length: 2049, sample length: 3282 [default0]:Skipping sample id=2746167. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2490054. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2739556. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2720444. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2734538. Maximum sequence length: 2049, sample length: 3596 [default0]:Skipping sample id=2756512. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2734138. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2732904. Maximum sequence length: 2049, sample length: 4122 [default0]:Skipping sample id=2711842. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2736170. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2717833. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2739175. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2728724. Maximum sequence length: 2049, sample length: 5023 [default0]:Skipping sample id=2732836. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2470754. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2746893. Maximum sequence length: 2049, sample length: 4237 [default0]:Skipping sample id=2742207. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2712683. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2719653. Maximum sequence length: 2049, sample length: 3560 [default0]:Skipping sample id=2736185. Maximum sequence length: 2049, sample length: 3259 [default0]:Skipping sample id=2717565. Maximum sequence length: 2049, sample length: 4548 [default0]:Skipping sample id=2725844. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2492238. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2748670. Maximum sequence length: 2049, sample length: 3968 [default0]:Skipping sample id=2714508. Maximum sequence length: 2049, sample length: 4426 [default0]:Skipping sample id=2715300. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2466337. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2493092. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2714517. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2721548. Maximum sequence length: 2049, sample length: 2879 [default0]:Skipping sample id=2744954. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2738020. Maximum sequence length: 2049, sample length: 3791 [default0]:Skipping sample id=2714420. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2732440. Maximum sequence length: 2049, sample length: 3265 [default0]:Skipping sample id=2721552. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2754625. Maximum sequence length: 2049, sample length: 2625 [default0]:Skipping sample id=2498779. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2753296. Maximum sequence length: 2049, sample length: 3685 [default0]:Skipping sample id=2749711. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2736560. Maximum sequence length: 2049, sample length: 3563 [default0]:Skipping sample id=2744441. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2721316. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2728381. Maximum sequence length: 2049, sample length: 4709 [default0]:Skipping sample id=2752783. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2496004. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2740852. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2727488. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2717132. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2499339. Maximum sequence length: 2049, sample length: 3102 [default0]:Skipping sample id=2732985. Maximum sequence length: 2049, sample length: 4761 [default0]:Skipping sample id=2746406. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2724419. Maximum sequence length: 2049, sample length: 2771 [default0]:Skipping sample id=2756990. Maximum sequence length: 2049, sample length: 2846 [default0]:Skipping sample id=2720059. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2731987. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2746834. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2748301. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2721729. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2735918. Maximum sequence length: 2049, sample length: 3809 [default0]:Skipping sample id=2745483. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2723272. Maximum sequence length: 2049, sample length: 3415 [default0]:Skipping sample id=2731217. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2744656. Maximum sequence length: 2049, sample length: 4559 [default0]:Skipping sample id=2754966. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2742296. Maximum sequence length: 2049, sample length: 5029 [default0]:Skipping sample id=2752684. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2737061. Maximum sequence length: 2049, sample length: 3602 [default0]:Skipping sample id=2726170. Maximum sequence length: 2049, sample length: 3003 [default0]:Skipping sample id=2735295. Maximum sequence length: 2049, sample length: 4793 [default0]:Skipping sample id=2749528. Maximum sequence length: 2049, sample length: 3726 [default0]:Skipping sample id=2727305. Maximum sequence length: 2049, sample length: 3871 [default0]:Skipping sample id=2484473. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2483511. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2755787. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2749687. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2738439. Maximum sequence length: 2049, sample length: 4276 [default0]:Skipping sample id=2471029. Maximum sequence length: 2049, sample length: 2910 [default0]:Skipping sample id=2714787. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2733540. Maximum sequence length: 2049, sample length: 2802 [default0]:Skipping sample id=2477795. Maximum sequence length: 2049, sample length: 2506 [default0]:Skipping sample id=2737930. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2752548. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2712638. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2479347. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2718984. Maximum sequence length: 2049, sample length: 2962 [default0]:Skipping sample id=2739569. Maximum sequence length: 2049, sample length: 3287 [default0]:Skipping sample id=2726678. Maximum sequence length: 2049, sample length: 3270 [default0]:Skipping sample id=2717619. Maximum sequence length: 2049, sample length: 3574 [default0]:Skipping sample id=2733475. Maximum sequence length: 2049, sample length: 3734 [default0]:Skipping sample id=2720809. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2755678. Maximum sequence length: 2049, sample length: 3901 [default0]:Skipping sample id=2756664. Maximum sequence length: 2049, sample length: 2907 [default0]:Skipping sample id=2752501. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2470670. Maximum sequence length: 2049, sample length: 2672 [default0]:Skipping sample id=2718175. Maximum sequence length: 2049, sample length: 4088 [default0]:Skipping sample id=2731046. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2732256. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2741390. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2725230. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2752943. Maximum sequence length: 2049, sample length: 4612 [default0]:Skipping sample id=2729831. Maximum sequence length: 2049, sample length: 3011 [default0]:Skipping sample id=2755626. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2751529. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2477173. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2730285. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2471002. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2716810. Maximum sequence length: 2049, sample length: 2506 [default0]:Skipping sample id=2733836. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2726535. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2713365. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2726374. Maximum sequence length: 2049, sample length: 4310 [default0]:Skipping sample id=2747320. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2714426. Maximum sequence length: 2049, sample length: 3531 [default0]:Skipping sample id=2737407. Maximum sequence length: 2049, sample length: 2486 [default0]:Skipping sample id=2731523. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2744249. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2745717. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2736148. Maximum sequence length: 2049, sample length: 2983 [default0]:Skipping sample id=2745893. Maximum sequence length: 2049, sample length: 5430 [default0]:Skipping sample id=2469791. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2746342. Maximum sequence length: 2049, sample length: 4042 [default0]:Skipping sample id=2731381. Maximum sequence length: 2049, sample length: 3567 [default0]:Skipping sample id=2716100. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2471231. Maximum sequence length: 2049, sample length: 3125 [default0]:Skipping sample id=2715517. Maximum sequence length: 2049, sample length: 5853 [default0]:Skipping sample id=2715585. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2730876. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2720411. Maximum sequence length: 2049, sample length: 2952 [default0]:Skipping sample id=2744111. Maximum sequence length: 2049, sample length: 4744 [default0]:Skipping sample id=2748555. Maximum sequence length: 2049, sample length: 4665 [default0]:Skipping sample id=2743595. Maximum sequence length: 2049, sample length: 3893 [default0]:Skipping sample id=2756801. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2717351. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2746665. Maximum sequence length: 2049, sample length: 3624 [default0]:Skipping sample id=2480571. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2723578. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2744364. Maximum sequence length: 2049, sample length: 3756 [default0]:Skipping sample id=2745222. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2715102. Maximum sequence length: 2049, sample length: 4304 [default0]:Skipping sample id=2751948. Maximum sequence length: 2049, sample length: 4697 [default0]:Skipping sample id=2713455. Maximum sequence length: 2049, sample length: 5717 [default0]:Skipping sample id=2714776. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2724206. Maximum sequence length: 2049, sample length: 3875 [default0]:Skipping sample id=2736988. Maximum sequence length: 2049, sample length: 3623 [default0]:Skipping sample id=2717311. Maximum sequence length: 2049, sample length: 4090 [default0]:Skipping sample id=2719053. Maximum sequence length: 2049, sample length: 3505 [default0]:Skipping sample id=2499446. Maximum sequence length: 2049, sample length: 3393 [default0]:Skipping sample id=2727617. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2711606. Maximum sequence length: 2049, sample length: 3330 [default0]:Skipping sample id=2750031. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2743485. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2715264. Maximum sequence length: 2049, sample length: 3145 [default0]:Skipping sample id=2746769. Maximum sequence length: 2049, sample length: 3106 [default0]:Skipping sample id=2753066. Maximum sequence length: 2049, sample length: 7073 [default0]:Skipping sample id=2712597. Maximum sequence length: 2049, sample length: 3228 [default0]:Skipping sample id=2750270. Maximum sequence length: 2049, sample length: 2568 [default0]:Skipping sample id=2714877. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2482511. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2735808. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2719667. Maximum sequence length: 2049, sample length: 4775 [default0]:Skipping sample id=2722063. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2738506. Maximum sequence length: 2049, sample length: 3017 [default0]:Skipping sample id=2721567. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2497318. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2755476. Maximum sequence length: 2049, sample length: 2949 [default0]:Skipping sample id=2717743. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2715816. Maximum sequence length: 2049, sample length: 3151 [default0]:Skipping sample id=2717510. Maximum sequence length: 2049, sample length: 5332 [default0]:Skipping sample id=2721489. Maximum sequence length: 2049, sample length: 2808 [default0]:Skipping sample id=2711506. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2750256. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2747471. Maximum sequence length: 2049, sample length: 3603 [default0]:Skipping sample id=2711824. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2483975. Maximum sequence length: 2049, sample length: 3202 [default0]:Skipping sample id=2724459. Maximum sequence length: 2049, sample length: 4710 [default0]:Skipping sample id=2719549. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2485912. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2726035. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2739651. Maximum sequence length: 2049, sample length: 4002 [default0]:Skipping sample id=2730543. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2722605. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2465872. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2730013. Maximum sequence length: 2049, sample length: 3178 [default0]:Skipping sample id=2729644. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2724199. Maximum sequence length: 2049, sample length: 3733 [default0]:Skipping sample id=2712719. Maximum sequence length: 2049, sample length: 2748 [default0]:Skipping sample id=2739172. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2721308. Maximum sequence length: 2049, sample length: 2884 [default0]:Skipping sample id=2711267. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2489787. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2749631. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2750578. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2717994. Maximum sequence length: 2049, sample length: 3234 [default0]:Skipping sample id=2711230. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2742587. Maximum sequence length: 2049, sample length: 7102 [default0]:Skipping sample id=2493641. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2730834. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2733198. Maximum sequence length: 2049, sample length: 2584 [default0]:Skipping sample id=2711420. Maximum sequence length: 2049, sample length: 3416 [default0]:Skipping sample id=2722666. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2489104. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2715650. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2716880. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2727860. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2752524. Maximum sequence length: 2049, sample length: 3964 [default0]:Skipping sample id=2712648. Maximum sequence length: 2049, sample length: 2774 [default0]:Skipping sample id=2718172. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2712423. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2712223. Maximum sequence length: 2049, sample length: 5458 [default0]:Skipping sample id=2736409. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2731025. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2498676. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2742736. Maximum sequence length: 2049, sample length: 3157 [default0]:Skipping sample id=2729365. Maximum sequence length: 2049, sample length: 5413 [default0]:Skipping sample id=2751234. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2477362. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2748587. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2755656. Maximum sequence length: 2049, sample length: 4203 [default0]:Skipping sample id=2482222. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2748482. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2751017. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2718017. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2736173. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2741888. Maximum sequence length: 2049, sample length: 3296 [default0]:Skipping sample id=2712294. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2713757. Maximum sequence length: 2049, sample length: 5583 [default0]:Skipping sample id=2727300. Maximum sequence length: 2049, sample length: 3015 [default0]:Skipping sample id=2745556. Maximum sequence length: 2049, sample length: 3506 [default0]:Skipping sample id=2755466. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2716858. Maximum sequence length: 2049, sample length: 3507 [default0]:Skipping sample id=2741414. Maximum sequence length: 2049, sample length: 4095 [default0]:Skipping sample id=2726232. Maximum sequence length: 2049, sample length: 2884 [default0]:Skipping sample id=2481674. Maximum sequence length: 2049, sample length: 2753 [default0]:Skipping sample id=2746186. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2740332. Maximum sequence length: 2049, sample length: 3115 [default0]:Skipping sample id=2755813. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2713938. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2732378. Maximum sequence length: 2049, sample length: 3603 [default0]:Skipping sample id=2746183. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2716821. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2721896. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2751917. Maximum sequence length: 2049, sample length: 3882 [default0]:Skipping sample id=2753963. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2737409. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2495008. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2735124. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2753639. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2731563. Maximum sequence length: 2049, sample length: 2839 [default0]:Skipping sample id=2494391. Maximum sequence length: 2049, sample length: 3032 [default0]:Skipping sample id=2748612. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2746307. Maximum sequence length: 2049, sample length: 4868 [default0]:Skipping sample id=2736377. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2751687. Maximum sequence length: 2049, sample length: 4317 [default0]:Skipping sample id=2488939. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2743332. Maximum sequence length: 2049, sample length: 4098 [default0]:Skipping sample id=2748492. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2738872. Maximum sequence length: 2049, sample length: 2640 [default0]:Skipping sample id=2724687. Maximum sequence length: 2049, sample length: 6472 [default0]:Skipping sample id=2720664. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2711424. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2751600. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2729685. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2493503. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2721303. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2738473. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2752091. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2737127. Maximum sequence length: 2049, sample length: 3298 [default0]:Skipping sample id=2725249. Maximum sequence length: 2049, sample length: 3242 [default0]:Skipping sample id=2712628. Maximum sequence length: 2049, sample length: 5357 [default0]:Skipping sample id=2718194. Maximum sequence length: 2049, sample length: 2892 [default0]:Skipping sample id=2722387. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2756055. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2720669. Maximum sequence length: 2049, sample length: 4072 [default0]:Skipping sample id=2735035. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2720906. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2738592. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2752666. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2482986. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2721691. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2466254. Maximum sequence length: 2049, sample length: 3248 [default0]:Skipping sample id=2729548. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2734467. Maximum sequence length: 2049, sample length: 4579 [default0]:Skipping sample id=2751648. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2489424. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2754655. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2730032. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2718432. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2737523. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2739367. Maximum sequence length: 2049, sample length: 4438 [default0]:Skipping sample id=2725504. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2740233. Maximum sequence length: 2049, sample length: 3099 [default0]:Skipping sample id=2478449. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2727981. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2756120. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2718796. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2476990. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2746640. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2720512. Maximum sequence length: 2049, sample length: 3595 [default0]:Skipping sample id=2743004. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2736307. Maximum sequence length: 2049, sample length: 3215 [default0]:Skipping sample id=2735674. Maximum sequence length: 2049, sample length: 5974 [default0]:Skipping sample id=2751363. Maximum sequence length: 2049, sample length: 4087 [default0]:Skipping sample id=2719768. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2740424. Maximum sequence length: 2049, sample length: 4534 [default0]:Skipping sample id=2719210. Maximum sequence length: 2049, sample length: 3259 [default0]:Skipping sample id=2467846. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2730033. Maximum sequence length: 2049, sample length: 4012 [default0]:Skipping sample id=2713593. Maximum sequence length: 2049, sample length: 2747 [default0]:Skipping sample id=2714486. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2467885. Maximum sequence length: 2049, sample length: 3450 [default0]:Skipping sample id=2756734. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2728434. Maximum sequence length: 2049, sample length: 4172 [default0]:Skipping sample id=2713346. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2711196. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2478492. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2719010. Maximum sequence length: 2049, sample length: 2557 [default0]:Skipping sample id=2739307. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2742372. Maximum sequence length: 2049, sample length: 3246 [default0]:Skipping sample id=2725730. Maximum sequence length: 2049, sample length: 4508 [default0]:Skipping sample id=2732688. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2755966. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2735942. Maximum sequence length: 2049, sample length: 4147 [default0]:Skipping sample id=2716136. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2741653. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2746162. Maximum sequence length: 2049, sample length: 6399 [default0]:Skipping sample id=2731456. Maximum sequence length: 2049, sample length: 5047 [default0]:Skipping sample id=2481622. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2735795. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2466550. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2736921. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2742740. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2732432. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2729427. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2747662. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2489458. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2719068. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2733097. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2498086. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2730296. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2722430. Maximum sequence length: 2049, sample length: 3156 [default0]:Skipping sample id=2717219. Maximum sequence length: 2049, sample length: 2919 [default0]:Skipping sample id=2486842. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2755043. Maximum sequence length: 2049, sample length: 4254 [default0]:Skipping sample id=2713283. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2755023. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2477704. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2756927. Maximum sequence length: 2049, sample length: 3107 [default0]:Skipping sample id=2747194. Maximum sequence length: 2049, sample length: 3056 [default0]:Skipping sample id=2755729. Maximum sequence length: 2049, sample length: 4363 [default0]:Skipping sample id=2740120. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2730982. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2737488. Maximum sequence length: 2049, sample length: 3685 [default0]:Skipping sample id=2484428. Maximum sequence length: 2049, sample length: 2513 [default0]:Skipping sample id=2752554. Maximum sequence length: 2049, sample length: 5033 [default0]:Skipping sample id=2481949. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2746131. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2739266. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2756992. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2722997. Maximum sequence length: 2049, sample length: 6309 [default0]:Skipping sample id=2747030. Maximum sequence length: 2049, sample length: 3195 [default0]:Skipping sample id=2728265. Maximum sequence length: 2049, sample length: 3093 [default0]:Skipping sample id=2719579. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2755924. Maximum sequence length: 2049, sample length: 3509 [default0]:Skipping sample id=2722494. Maximum sequence length: 2049, sample length: 3605 [default0]:Skipping sample id=2725597. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2749908. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2741307. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2711078. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2720845. Maximum sequence length: 2049, sample length: 3404 [default0]:Skipping sample id=2747257. Maximum sequence length: 2049, sample length: 3660 [default0]:Skipping sample id=2722067. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2751492. Maximum sequence length: 2049, sample length: 4532 [default0]:Skipping sample id=2711237. Maximum sequence length: 2049, sample length: 3297 [default0]:Skipping sample id=2724095. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2716651. Maximum sequence length: 2049, sample length: 5194 [default0]:Skipping sample id=2745391. Maximum sequence length: 2049, sample length: 6480 [default0]:Skipping sample id=2714474. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2487713. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2744015. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2744482. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2713874. Maximum sequence length: 2049, sample length: 4645 [default0]:Skipping sample id=2731240. Maximum sequence length: 2049, sample length: 3997 [default0]:Skipping sample id=2729118. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2753603. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2714857. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2477760. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2750041. Maximum sequence length: 2049, sample length: 2587 [default0]:Skipping sample id=2724209. Maximum sequence length: 2049, sample length: 4596 [default0]:Skipping sample id=2730780. Maximum sequence length: 2049, sample length: 2824 [default0]:Skipping sample id=2756479. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2733797. Maximum sequence length: 2049, sample length: 3903 [default0]:Skipping sample id=2735958. Maximum sequence length: 2049, sample length: 3569 [default0]:Skipping sample id=2466142. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2737590. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2747708. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2477742. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2741772. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2719135. Maximum sequence length: 2049, sample length: 4148 [default0]:Skipping sample id=2750533. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2726392. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2471237. Maximum sequence length: 2049, sample length: 2590 [default0]:Skipping sample id=2739482. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2715041. Maximum sequence length: 2049, sample length: 5231 [default0]:Skipping sample id=2744993. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2717241. Maximum sequence length: 2049, sample length: 5143 [default0]:Skipping sample id=2742293. Maximum sequence length: 2049, sample length: 3361 [default0]:Skipping sample id=2723914. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2729774. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2748745. Maximum sequence length: 2049, sample length: 4704 [default0]:Skipping sample id=2743398. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2731878. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2492693. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2726913. Maximum sequence length: 2049, sample length: 4673 [default0]:Skipping sample id=2747259. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2750675. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2712331. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2749401. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2718621. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2717470. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2749881. Maximum sequence length: 2049, sample length: 5152 [default0]:Skipping sample id=2741085. Maximum sequence length: 2049, sample length: 3678 [default0]:Skipping sample id=2493530. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2728764. Maximum sequence length: 2049, sample length: 3990 [default0]:Skipping sample id=2714803. Maximum sequence length: 2049, sample length: 2703 [default0]:Skipping sample id=2487651. Maximum sequence length: 2049, sample length: 2642 [default0]:Skipping sample id=2729536. Maximum sequence length: 2049, sample length: 3935 [default0]:Skipping sample id=2471049. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2728667. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2714252. Maximum sequence length: 2049, sample length: 4537 [default0]:Skipping sample id=2741592. Maximum sequence length: 2049, sample length: 4206 [default0]:Skipping sample id=2747392. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2747617. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2738297. Maximum sequence length: 2049, sample length: 4657 [default0]:Skipping sample id=2728603. Maximum sequence length: 2049, sample length: 3458 [default0]:Skipping sample id=2733103. Maximum sequence length: 2049, sample length: 2470 [default0]:Skipping sample id=2734903. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2755511. Maximum sequence length: 2049, sample length: 2933 [default0]:Skipping sample id=2730671. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2757038. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2738755. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2727698. Maximum sequence length: 2049, sample length: 2969 [default0]:Skipping sample id=2731166. Maximum sequence length: 2049, sample length: 3296 [default0]:Skipping sample id=2745809. Maximum sequence length: 2049, sample length: 3118 [default0]:Skipping sample id=2719337. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2465818. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2469274. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2745826. Maximum sequence length: 2049, sample length: 3502 [default0]:Skipping sample id=2737227. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2729904. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2723764. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2471016. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2749064. Maximum sequence length: 2049, sample length: 3553 [default0]:Skipping sample id=2467257. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2712709. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2477701. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2745366. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2734401. Maximum sequence length: 2049, sample length: 4142 [default0]:Skipping sample id=2751257. Maximum sequence length: 2049, sample length: 2639 [default0]:Skipping sample id=2740710. Maximum sequence length: 2049, sample length: 3713 [default0]:Skipping sample id=2744897. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2755326. Maximum sequence length: 2049, sample length: 4246 [default0]:Skipping sample id=2740337. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2728924. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2715441. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2498358. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2739338. Maximum sequence length: 2049, sample length: 4799 [default0]:Skipping sample id=2751907. Maximum sequence length: 2049, sample length: 4077 [default0]:Skipping sample id=2721294. Maximum sequence length: 2049, sample length: 2980 [default0]:Skipping sample id=2715424. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2747130. Maximum sequence length: 2049, sample length: 3745 [default0]:Skipping sample id=2734123. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2721826. Maximum sequence length: 2049, sample length: 3877 [default0]:Skipping sample id=2721505. Maximum sequence length: 2049, sample length: 4834 [default0]:Skipping sample id=2739480. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2485561. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2751937. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2753619. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2486345. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2748467. Maximum sequence length: 2049, sample length: 3019 [default0]:Skipping sample id=2737876. Maximum sequence length: 2049, sample length: 5325 [default0]:Skipping sample id=2716104. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2756868. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2753505. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2724276. Maximum sequence length: 2049, sample length: 3223 [default0]:Skipping sample id=2729336. Maximum sequence length: 2049, sample length: 6007 [default0]:Skipping sample id=2715912. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2479037. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2489055. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2470821. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2750593. Maximum sequence length: 2049, sample length: 3740 [default0]:Skipping sample id=2726545. Maximum sequence length: 2049, sample length: 4900 [default0]:Skipping sample id=2740995. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2755525. Maximum sequence length: 2049, sample length: 2672 [default0]:Skipping sample id=2742007. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2717563. Maximum sequence length: 2049, sample length: 3290 [default0]:Skipping sample id=2725196. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2718634. Maximum sequence length: 2049, sample length: 3270 [default0]:Skipping sample id=2729724. Maximum sequence length: 2049, sample length: 5544 [default0]:Skipping sample id=2730129. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2748247. Maximum sequence length: 2049, sample length: 4694 [default0]:Skipping sample id=2489615. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2726187. Maximum sequence length: 2049, sample length: 4995 [default0]:Skipping sample id=2737147. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2712711. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2748895. Maximum sequence length: 2049, sample length: 5154 [default0]:Skipping sample id=2750317. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2482984. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2730703. Maximum sequence length: 2049, sample length: 4172 [default0]:Skipping sample id=2485655. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2480992. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2719532. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2479940. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2477855. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2729252. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2718176. Maximum sequence length: 2049, sample length: 4182 [default0]:Skipping sample id=2733711. Maximum sequence length: 2049, sample length: 4618 [default0]:Skipping sample id=2745420. Maximum sequence length: 2049, sample length: 3895 [default0]:Skipping sample id=2746084. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2470138. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2712704. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2742712. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2717341. Maximum sequence length: 2049, sample length: 2878 [default0]:Skipping sample id=2714608. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2738946. Maximum sequence length: 2049, sample length: 2788 [default0]:Skipping sample id=2471291. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2729346. Maximum sequence length: 2049, sample length: 3211 [default0]:Skipping sample id=2751245. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2748703. Maximum sequence length: 2049, sample length: 3091 [default0]:Skipping sample id=2713447. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2727606. Maximum sequence length: 2049, sample length: 2992 [default0]:Skipping sample id=2755406. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2488401. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2719821. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2752131. Maximum sequence length: 2049, sample length: 2904 [default0]:Skipping sample id=2726489. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2718808. Maximum sequence length: 2049, sample length: 3146 [default0]:Skipping sample id=2744923. Maximum sequence length: 2049, sample length: 3702 [default0]:Skipping sample id=2731054. Maximum sequence length: 2049, sample length: 4218 [default0]:Skipping sample id=2755509. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2739151. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2498243. Maximum sequence length: 2049, sample length: 4282 [default0]:Skipping sample id=2715348. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2478074. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2742773. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2725410. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2712342. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2487459. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2738623. Maximum sequence length: 2049, sample length: 3803 [default0]:Skipping sample id=2727583. Maximum sequence length: 2049, sample length: 3061 [default0]:Skipping sample id=2746538. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2718544. Maximum sequence length: 2049, sample length: 3235 [default0]:Skipping sample id=2495686. Maximum sequence length: 2049, sample length: 3514 [default0]:Skipping sample id=2724967. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2751953. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2717344. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2738713. Maximum sequence length: 2049, sample length: 3820 [default0]:Skipping sample id=2750567. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2720375. Maximum sequence length: 2049, sample length: 2565 [default0]:Skipping sample id=2725135. Maximum sequence length: 2049, sample length: 2687 [default0]:Skipping sample id=2743062. Maximum sequence length: 2049, sample length: 2822 [default0]:Skipping sample id=2734438. Maximum sequence length: 2049, sample length: 2507 [default0]:Skipping sample id=2755835. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2712922. Maximum sequence length: 2049, sample length: 4183 [default0]:Skipping sample id=2477707. Maximum sequence length: 2049, sample length: 2822 [default0]:Skipping sample id=2715746. Maximum sequence length: 2049, sample length: 3228 [default0]:Skipping sample id=2744251. Maximum sequence length: 2049, sample length: 3805 [default0]:Skipping sample id=2721824. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2721019. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2737966. Maximum sequence length: 2049, sample length: 6255 [default0]:Skipping sample id=2738806. Maximum sequence length: 2049, sample length: 3545 [default0]:Skipping sample id=2751935. Maximum sequence length: 2049, sample length: 3914 [default0]:Skipping sample id=2715745. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2744725. Maximum sequence length: 2049, sample length: 4421 [default0]:Skipping sample id=2482657. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2723581. Maximum sequence length: 2049, sample length: 6335 [default0]:Skipping sample id=2478866. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2751437. Maximum sequence length: 2049, sample length: 3772 [default0]:Skipping sample id=2498863. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2753115. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2715392. Maximum sequence length: 2049, sample length: 3467 [default0]:Skipping sample id=2752772. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2716611. Maximum sequence length: 2049, sample length: 3462 [default0]:Skipping sample id=2738905. Maximum sequence length: 2049, sample length: 3357 [default0]:Skipping sample id=2750420. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2742482. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2756911. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2721902. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2726788. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2728793. Maximum sequence length: 2049, sample length: 2987 [default0]:Skipping sample id=2478063. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2750864. Maximum sequence length: 2049, sample length: 4600 [default0]:Skipping sample id=2729262. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2721589. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2721344. Maximum sequence length: 2049, sample length: 3585 [default0]:Skipping sample id=2483400. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2728628. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2742342. Maximum sequence length: 2049, sample length: 3596 [default0]:Skipping sample id=2728889. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2723334. Maximum sequence length: 2049, sample length: 4875 [default0]:Skipping sample id=2471076. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2738269. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2725051. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2738533. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2732134. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2753429. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2719986. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2718758. Maximum sequence length: 2049, sample length: 4502 [default0]:Skipping sample id=2717002. Maximum sequence length: 2049, sample length: 3020 [default0]:Skipping sample id=2714464. Maximum sequence length: 2049, sample length: 3353 [default0]:Skipping sample id=2716195. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2496080. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2750760. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2736050. Maximum sequence length: 2049, sample length: 3917 [default0]:Skipping sample id=2715164. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2747429. Maximum sequence length: 2049, sample length: 4508 [default0]:Skipping sample id=2720413. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2740794. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2732866. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2713267. Maximum sequence length: 2049, sample length: 4960 [default0]:Skipping sample id=2481597. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2741429. Maximum sequence length: 2049, sample length: 4309 [default0]:Skipping sample id=2749438. Maximum sequence length: 2049, sample length: 4121 [default0]:Skipping sample id=2742023. Maximum sequence length: 2049, sample length: 4998 [default0]:Skipping sample id=2488128. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2739197. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2744492. Maximum sequence length: 2049, sample length: 4495 [default0]:Skipping sample id=2491350. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2721635. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2749791. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2466328. Maximum sequence length: 2049, sample length: 2234 [default0]:Skipping sample id=2725350. Maximum sequence length: 2049, sample length: 6272 [default0]:Skipping sample id=2716942. Maximum sequence length: 2049, sample length: 7283 [default0]:Skipping sample id=2741488. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2714479. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2712402. Maximum sequence length: 2049, sample length: 3060 [default0]:Skipping sample id=2745831. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2726403. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2720858. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2748287. Maximum sequence length: 2049, sample length: 3641 [default0]:Skipping sample id=2725904. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2753041. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2712196. Maximum sequence length: 2049, sample length: 3071 [default0]:Skipping sample id=2713065. Maximum sequence length: 2049, sample length: 2892 [default0]:Skipping sample id=2715213. Maximum sequence length: 2049, sample length: 3833 [default0]:Skipping sample id=2718287. Maximum sequence length: 2049, sample length: 2985 [default0]:Skipping sample id=2735596. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2756233. Maximum sequence length: 2049, sample length: 3916 [default0]:Skipping sample id=2738563. Maximum sequence length: 2049, sample length: 4771 [default0]:Skipping sample id=2716983. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2756092. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2727893. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2729835. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2740608. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2738914. Maximum sequence length: 2049, sample length: 3275 [default0]:Skipping sample id=2735886. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2720278. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2737357. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2735911. Maximum sequence length: 2049, sample length: 5532 [default0]:Skipping sample id=2719230. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2719622. Maximum sequence length: 2049, sample length: 2855 [default0]:Skipping sample id=2754537. Maximum sequence length: 2049, sample length: 3147 [default0]:Skipping sample id=2482397. Maximum sequence length: 2049, sample length: 2406 [default0]:Skipping sample id=2722552. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2726028. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2749510. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2483889. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2736430. Maximum sequence length: 2049, sample length: 3687 [default0]:Skipping sample id=2489986. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2735040. Maximum sequence length: 2049, sample length: 3550 [default0]:Skipping sample id=2486623. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2755733. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2742476. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2747555. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2742105. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2722663. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2471131. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2735033. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2718107. Maximum sequence length: 2049, sample length: 3915 [default0]:Skipping sample id=2747475. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2730321. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2739533. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2736420. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2720556. Maximum sequence length: 2049, sample length: 4448 [default0]:Skipping sample id=2737508. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2489501. Maximum sequence length: 2049, sample length: 3583 [default0]:Skipping sample id=2497228. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2740423. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2740059. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2718495. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2480452. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2752437. Maximum sequence length: 2049, sample length: 5835 [default0]:Skipping sample id=2732537. Maximum sequence length: 2049, sample length: 3699 [default0]:Skipping sample id=2751148. Maximum sequence length: 2049, sample length: 3620 [default0]:Skipping sample id=2730404. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2719652. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2742979. Maximum sequence length: 2049, sample length: 4594 [default0]:Skipping sample id=2718832. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2712794. Maximum sequence length: 2049, sample length: 4542 [default0]:Skipping sample id=2715744. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2718723. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2750653. Maximum sequence length: 2049, sample length: 3646 [default0]:Skipping sample id=2724897. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2748245. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2736721. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2748337. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2715261. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2749559. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2756449. Maximum sequence length: 2049, sample length: 4556 [default0]:Skipping sample id=2747955. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2733882. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2749607. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2723573. Maximum sequence length: 2049, sample length: 3171 [default0]:Skipping sample id=2497929. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2718310. Maximum sequence length: 2049, sample length: 4416 [default0]:Skipping sample id=2486097. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2740093. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2487412. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2717201. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2728117. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2724022. Maximum sequence length: 2049, sample length: 5773 [default0]:Skipping sample id=2746951. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2746280. Maximum sequence length: 2049, sample length: 2978 [default0]:Skipping sample id=2478353. Maximum sequence length: 2049, sample length: 3197 [default0]:Skipping sample id=2733796. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2756293. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2744041. Maximum sequence length: 2049, sample length: 4771 [default0]:Skipping sample id=2715296. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2717645. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2745372. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2718669. Maximum sequence length: 2049, sample length: 2992 [default0]:Skipping sample id=2744893. Maximum sequence length: 2049, sample length: 8121 [default0]:Skipping sample id=2714019. Maximum sequence length: 2049, sample length: 3970 [default0]:Skipping sample id=2730465. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2716362. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2745621. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2728761. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2487321. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2740395. Maximum sequence length: 2049, sample length: 3639 [default0]:Skipping sample id=2720330. Maximum sequence length: 2049, sample length: 4184 [default0]:Skipping sample id=2720197. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2748461. Maximum sequence length: 2049, sample length: 4105 [default0]:Skipping sample id=2724155. Maximum sequence length: 2049, sample length: 2589 [default0]:Skipping sample id=2479310. Maximum sequence length: 2049, sample length: 3842 [default0]:Skipping sample id=2713745. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2482061. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2733917. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2737740. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2745189. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2730605. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2713595. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2741002. Maximum sequence length: 2049, sample length: 3202 [default0]:Skipping sample id=2732018. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2747871. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2719804. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2484659. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2751442. Maximum sequence length: 2049, sample length: 3649 [default0]:Skipping sample id=2491166. Maximum sequence length: 2049, sample length: 2939 [default0]:Skipping sample id=2736352. Maximum sequence length: 2049, sample length: 3646 [default0]:Skipping sample id=2732246. Maximum sequence length: 2049, sample length: 5302 [default0]:Skipping sample id=2727663. Maximum sequence length: 2049, sample length: 2951 [default0]:Skipping sample id=2737115. Maximum sequence length: 2049, sample length: 4065 [default0]:Skipping sample id=2752996. Maximum sequence length: 2049, sample length: 3952 [default0]:Skipping sample id=2733310. Maximum sequence length: 2049, sample length: 4532 [default0]:Skipping sample id=2719703. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2745408. Maximum sequence length: 2049, sample length: 2829 [default0]:Skipping sample id=2723978. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2742802. Maximum sequence length: 2049, sample length: 3056 [default0]:Skipping sample id=2749710. Maximum sequence length: 2049, sample length: 4069 [default0]:Skipping sample id=2722794. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2731903. Maximum sequence length: 2049, sample length: 2592 [default0]:Skipping sample id=2736812. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2723347. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2719003. Maximum sequence length: 2049, sample length: 4645 [default0]:Skipping sample id=2725246. Maximum sequence length: 2049, sample length: 2889 [default0]:Skipping sample id=2727947. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2721055. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2716119. Maximum sequence length: 2049, sample length: 4345 [default0]:Skipping sample id=2723989. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2747687. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2715049. Maximum sequence length: 2049, sample length: 3835 [default0]:Skipping sample id=2729080. Maximum sequence length: 2049, sample length: 4971 [default0]:Skipping sample id=2718480. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2756676. Maximum sequence length: 2049, sample length: 3231 [default0]:Skipping sample id=2737205. Maximum sequence length: 2049, sample length: 4836 [default0]:Skipping sample id=2717868. Maximum sequence length: 2049, sample length: 6302 [default0]:Skipping sample id=2753937. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2732860. Maximum sequence length: 2049, sample length: 4909 [default0]:Skipping sample id=2739637. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2730446. Maximum sequence length: 2049, sample length: 5077 [default0]:Skipping sample id=2730603. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2747764. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2499067. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2728137. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2753181. Maximum sequence length: 2049, sample length: 5210 [default0]:Skipping sample id=2713662. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2740863. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2467071. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2719668. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2751456. Maximum sequence length: 2049, sample length: 3381 [default0]:Skipping sample id=2726513. Maximum sequence length: 2049, sample length: 5928 [default0]:Skipping sample id=2754378. Maximum sequence length: 2049, sample length: 4225 [default0]:Skipping sample id=2753421. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2717074. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2481302. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2730800. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2470786. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2721988. Maximum sequence length: 2049, sample length: 5319 [default0]:Skipping sample id=2734980. Maximum sequence length: 2049, sample length: 3343 [default0]:Skipping sample id=2719060. Maximum sequence length: 2049, sample length: 3518 [default0]:Skipping sample id=2736601. Maximum sequence length: 2049, sample length: 3191 [default0]:Skipping sample id=2756332. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2722659. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2711289. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2742561. Maximum sequence length: 2049, sample length: 6760 [default0]:Skipping sample id=2714257. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2721601. Maximum sequence length: 2049, sample length: 3852 [default0]:Skipping sample id=2734080. Maximum sequence length: 2049, sample length: 4191 [default0]:Skipping sample id=2480362. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2720089. Maximum sequence length: 2049, sample length: 3634 [default0]:Skipping sample id=2723717. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2743224. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2729122. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2715533. Maximum sequence length: 2049, sample length: 5146 [default0]:Skipping sample id=2717723. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2732035. Maximum sequence length: 2049, sample length: 3534 [default0]:Skipping sample id=2754259. Maximum sequence length: 2049, sample length: 2646 [default0]:Skipping sample id=2726911. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2729224. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2741948. Maximum sequence length: 2049, sample length: 3519 [default0]:Skipping sample id=2725573. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2723608. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2718989. Maximum sequence length: 2049, sample length: 5373 [default0]:Skipping sample id=2723025. Maximum sequence length: 2049, sample length: 4165 [default0]:Skipping sample id=2750291. Maximum sequence length: 2049, sample length: 3045 [default0]:Skipping sample id=2745285. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2716124. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2735809. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2734849. Maximum sequence length: 2049, sample length: 3326 [default0]:Skipping sample id=2745584. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2731818. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2730045. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2729298. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2738636. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2725791. Maximum sequence length: 2049, sample length: 5153 [default0]:Skipping sample id=2749137. Maximum sequence length: 2049, sample length: 4796 [default0]:Skipping sample id=2727429. Maximum sequence length: 2049, sample length: 3836 [default0]:Skipping sample id=2712525. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2755193. Maximum sequence length: 2049, sample length: 2709 [default0]:Skipping sample id=2725537. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2733866. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2711598. Maximum sequence length: 2049, sample length: 3479 [default0]:Skipping sample id=2492524. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2494982. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2490012. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2723023. Maximum sequence length: 2049, sample length: 3644 [default0]:Skipping sample id=2713913. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2487620. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2490183. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2756571. Maximum sequence length: 2049, sample length: 3647 [default0]:Skipping sample id=2733021. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2742953. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2744092. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2737116. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2754688. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2725642. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2751696. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2724189. Maximum sequence length: 2049, sample length: 3525 [default0]:Skipping sample id=2729849. Maximum sequence length: 2049, sample length: 5155 [default0]:Skipping sample id=2728980. Maximum sequence length: 2049, sample length: 2737 [default0]:Skipping sample id=2719027. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2733503. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2717634. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2716571. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2711281. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2493795. Maximum sequence length: 2049, sample length: 2406 [default0]:Skipping sample id=2752614. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2730091. Maximum sequence length: 2049, sample length: 3946 [default0]:Skipping sample id=2738371. Maximum sequence length: 2049, sample length: 6099 [default0]:Skipping sample id=2741569. Maximum sequence length: 2049, sample length: 3096 [default0]:Skipping sample id=2720897. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2735672. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2712855. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2730768. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2749883. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2730315. Maximum sequence length: 2049, sample length: 2846 [default0]:Skipping sample id=2712375. Maximum sequence length: 2049, sample length: 3996 [default0]:Skipping sample id=2755292. Maximum sequence length: 2049, sample length: 3413 [default0]:Skipping sample id=2714976. Maximum sequence length: 2049, sample length: 4635 [default0]:Skipping sample id=2743673. Maximum sequence length: 2049, sample length: 4199 [default0]:Skipping sample id=2733358. Maximum sequence length: 2049, sample length: 4899 [default0]:Skipping sample id=2719714. Maximum sequence length: 2049, sample length: 2999 [default0]:Skipping sample id=2742783. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2730727. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2720075. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2754080. Maximum sequence length: 2049, sample length: 3032 [default0]:Skipping sample id=2713865. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2497860. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2728071. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2745791. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2484485. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2742421. Maximum sequence length: 2049, sample length: 2748 [default0]:Skipping sample id=2756124. Maximum sequence length: 2049, sample length: 6146 [default0]:Skipping sample id=2739169. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2741494. Maximum sequence length: 2049, sample length: 2707 [default0]:Skipping sample id=2724726. Maximum sequence length: 2049, sample length: 3658 [default0]:Skipping sample id=2717229. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2713227. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2480038. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2466518. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2486872. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2746788. Maximum sequence length: 2049, sample length: 3875 [default0]:Skipping sample id=2746424. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2751264. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2479376. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2747009. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2722291. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2487114. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2752006. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2735945. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2479467. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2486614. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2753015. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2750444. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2740284. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2753789. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2714739. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2735450. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2725668. Maximum sequence length: 2049, sample length: 2975 [default0]:Skipping sample id=2739717. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2747134. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2733085. Maximum sequence length: 2049, sample length: 3591 [default0]:Skipping sample id=2729018. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2712473. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2743817. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2739757. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2469031. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2724371. Maximum sequence length: 2049, sample length: 2865 [default0]:Skipping sample id=2711086. Maximum sequence length: 2049, sample length: 3578 [default0]:Skipping sample id=2493559. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2726876. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2716667. Maximum sequence length: 2049, sample length: 3643 [default0]:Skipping sample id=2755555. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2481063. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2720490. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2744345. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2721160. Maximum sequence length: 2049, sample length: 4268 [default0]:Skipping sample id=2746985. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2740535. Maximum sequence length: 2049, sample length: 2888 [default0]:Skipping sample id=2711995. Maximum sequence length: 2049, sample length: 3674 [default0]:Skipping sample id=2467451. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2468252. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2741624. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2753376. Maximum sequence length: 2049, sample length: 4692 [default0]:Skipping sample id=2720155. Maximum sequence length: 2049, sample length: 3588 [default0]:Skipping sample id=2753189. Maximum sequence length: 2049, sample length: 3710 [default0]:Skipping sample id=2729156. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2729081. Maximum sequence length: 2049, sample length: 3022 [default0]:Skipping sample id=2489779. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2484347. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2746906. Maximum sequence length: 2049, sample length: 3560 [default0]:Skipping sample id=2725780. Maximum sequence length: 2049, sample length: 4152 [default0]:Skipping sample id=2737420. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2720768. Maximum sequence length: 2049, sample length: 3000 [default0]:Skipping sample id=2730278. Maximum sequence length: 2049, sample length: 4769 [default0]:Skipping sample id=2756397. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2733206. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2726336. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2733957. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2468197. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2726815. Maximum sequence length: 2049, sample length: 3279 [default0]:Skipping sample id=2755540. Maximum sequence length: 2049, sample length: 4294 [default0]:Skipping sample id=2715994. Maximum sequence length: 2049, sample length: 3577 [default0]:Skipping sample id=2721536. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2736591. Maximum sequence length: 2049, sample length: 3147 [default0]:Skipping sample id=2721135. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2725407. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2482017. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2750611. Maximum sequence length: 2049, sample length: 2471 [default0]:Skipping sample id=2726111. Maximum sequence length: 2049, sample length: 4014 [default0]:Skipping sample id=2722969. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2483602. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2479973. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2721178. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2749287. Maximum sequence length: 2049, sample length: 4409 [default0]:Skipping sample id=2733817. Maximum sequence length: 2049, sample length: 2924 [default0]:Skipping sample id=2713687. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2736463. Maximum sequence length: 2049, sample length: 4303 [default0]:Skipping sample id=2754603. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2744006. Maximum sequence length: 2049, sample length: 2728 [default0]:Skipping sample id=2745116. Maximum sequence length: 2049, sample length: 3401 [default0]:Skipping sample id=2468696. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2738870. Maximum sequence length: 2049, sample length: 4786 [default0]:Skipping sample id=2714058. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2739393. Maximum sequence length: 2049, sample length: 2430 [default0]:Skipping sample id=2721104. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2499058. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2738095. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2478672. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2719701. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2714179. Maximum sequence length: 2049, sample length: 3421 [default0]:Skipping sample id=2751739. Maximum sequence length: 2049, sample length: 3546 [default0]:Skipping sample id=2735097. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2711356. Maximum sequence length: 2049, sample length: 2861 [default0]:Skipping sample id=2719565. Maximum sequence length: 2049, sample length: 4719 [default0]:Skipping sample id=2482551. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2737044. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2729112. Maximum sequence length: 2049, sample length: 3677 [default0]:Skipping sample id=2717158. Maximum sequence length: 2049, sample length: 3405 [default0]:Skipping sample id=2752925. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2754826. Maximum sequence length: 2049, sample length: 2234 [default0]:Skipping sample id=2487087. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2731282. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2729008. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2718265. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2714062. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2738203. Maximum sequence length: 2049, sample length: 4095 [default0]:Skipping sample id=2727191. Maximum sequence length: 2049, sample length: 4156 [default0]:Skipping sample id=2468405. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2488260. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2754345. Maximum sequence length: 2049, sample length: 3496 [default0]:Skipping sample id=2711615. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2484943. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2717530. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2727439. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2736046. Maximum sequence length: 2049, sample length: 3307 [default0]:Skipping sample id=2755630. Maximum sequence length: 2049, sample length: 2758 [default0]:Skipping sample id=2731532. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2732780. Maximum sequence length: 2049, sample length: 3939 [default0]:Skipping sample id=2479902. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2741833. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2720982. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2722253. Maximum sequence length: 2049, sample length: 3681 [default0]:Skipping sample id=2717016. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2729477. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2727598. Maximum sequence length: 2049, sample length: 5336 [default0]:Skipping sample id=2749512. Maximum sequence length: 2049, sample length: 3273 [default0]:Skipping sample id=2749660. Maximum sequence length: 2049, sample length: 5450 [default0]:Skipping sample id=2739825. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2741571. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2726541. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2752695. Maximum sequence length: 2049, sample length: 3998 [default0]:Skipping sample id=2730747. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2739384. Maximum sequence length: 2049, sample length: 3994 [default0]:Skipping sample id=2735556. Maximum sequence length: 2049, sample length: 3823 [default0]:Skipping sample id=2750852. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2497289. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2720422. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2746939. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2738140. Maximum sequence length: 2049, sample length: 3329 [default0]:Skipping sample id=2717481. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2716085. Maximum sequence length: 2049, sample length: 3201 [default0]:Skipping sample id=2729756. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2717450. Maximum sequence length: 2049, sample length: 6492 [default0]:Skipping sample id=2747898. Maximum sequence length: 2049, sample length: 3612 [default0]:Skipping sample id=2489916. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2750939. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2748831. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2739425. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2711082. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2494539. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2733986. Maximum sequence length: 2049, sample length: 6141 [default0]:Skipping sample id=2753726. Maximum sequence length: 2049, sample length: 4078 [default0]:Skipping sample id=2730658. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2742050. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2744387. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2718546. Maximum sequence length: 2049, sample length: 2903 [default0]:Skipping sample id=2720796. Maximum sequence length: 2049, sample length: 4228 [default0]:Skipping sample id=2755960. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2743931. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2749758. Maximum sequence length: 2049, sample length: 2826 [default0]:Skipping sample id=2731129. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2469219. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2714924. Maximum sequence length: 2049, sample length: 4597 [default0]:Skipping sample id=2731592. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2744381. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2746756. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2749206. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2712994. Maximum sequence length: 2049, sample length: 4719 [default0]:Skipping sample id=2713898. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2715552. Maximum sequence length: 2049, sample length: 2974 [default0]:Skipping sample id=2728070. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2712303. Maximum sequence length: 2049, sample length: 6956 [default0]:Skipping sample id=2749111. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2714281. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2740887. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2732781. Maximum sequence length: 2049, sample length: 3571 [default0]:Skipping sample id=2722732. Maximum sequence length: 2049, sample length: 2934 [default0]:Skipping sample id=2712061. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2715077. Maximum sequence length: 2049, sample length: 7785 [default0]:Skipping sample id=2749427. Maximum sequence length: 2049, sample length: 2771 [default0]:Skipping sample id=2743761. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2748646. Maximum sequence length: 2049, sample length: 2555 [default0]:Skipping sample id=2714104. Maximum sequence length: 2049, sample length: 4572 [default0]:Skipping sample id=2716045. Maximum sequence length: 2049, sample length: 4827 [default0]:Skipping sample id=2757082. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2719148. Maximum sequence length: 2049, sample length: 3970 [default0]:Skipping sample id=2721106. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2488608. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2712457. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2470037. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2725182. Maximum sequence length: 2049, sample length: 4604 [default0]:Skipping sample id=2483978. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2726548. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2726636. Maximum sequence length: 2049, sample length: 5191 [default0]:Skipping sample id=2756497. Maximum sequence length: 2049, sample length: 3231 [default0]:Skipping sample id=2739895. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2753268. Maximum sequence length: 2049, sample length: 3504 [default0]:Skipping sample id=2752465. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2733563. Maximum sequence length: 2049, sample length: 4171 [default0]:Skipping sample id=2735798. Maximum sequence length: 2049, sample length: 2942 [default0]:Skipping sample id=2741828. Maximum sequence length: 2049, sample length: 3075 [default0]:Skipping sample id=2747743. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2722480. Maximum sequence length: 2049, sample length: 3172 [default0]:Skipping sample id=2489938. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2730996. Maximum sequence length: 2049, sample length: 3903 [default0]:Skipping sample id=2739527. Maximum sequence length: 2049, sample length: 4558 [default0]:Skipping sample id=2731663. Maximum sequence length: 2049, sample length: 3921 [default0]:Skipping sample id=2719252. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2716134. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2735349. Maximum sequence length: 2049, sample length: 2468 [default0]:Skipping sample id=2733486. Maximum sequence length: 2049, sample length: 4928 [default0]:Skipping sample id=2481366. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2749036. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2722820. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2715553. Maximum sequence length: 2049, sample length: 3564 [default0]:Skipping sample id=2739559. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2497449. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2715284. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2721410. Maximum sequence length: 2049, sample length: 3369 [default0]:Skipping sample id=2712436. Maximum sequence length: 2049, sample length: 4548 [default0]:Skipping sample id=2747716. Maximum sequence length: 2049, sample length: 7109 [default0]:Skipping sample id=2742666. Maximum sequence length: 2049, sample length: 2871 [default0]:Skipping sample id=2716032. Maximum sequence length: 2049, sample length: 4134 [default0]:Skipping sample id=2712023. Maximum sequence length: 2049, sample length: 3277 [default0]:Skipping sample id=2713048. Maximum sequence length: 2049, sample length: 3143 [default0]:Skipping sample id=2468323. Maximum sequence length: 2049, sample length: 3016 [default0]:Skipping sample id=2733195. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2737657. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2736402. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2715472. Maximum sequence length: 2049, sample length: 4501 [default0]:Skipping sample id=2493312. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2713142. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2495118. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2716953. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2731131. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2748819. Maximum sequence length: 2049, sample length: 3092 [default0]:Skipping sample id=2721584. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2731584. Maximum sequence length: 2049, sample length: 4031 [default0]:Skipping sample id=2722010. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2750584. Maximum sequence length: 2049, sample length: 3582 [default0]:Skipping sample id=2736772. Maximum sequence length: 2049, sample length: 3185 [default0]:Skipping sample id=2754397. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2739887. Maximum sequence length: 2049, sample length: 2908 [default0]:Skipping sample id=2492494. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2725006. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2741175. Maximum sequence length: 2049, sample length: 3232 [default0]:Skipping sample id=2723705. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2489683. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2741732. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2731617. Maximum sequence length: 2049, sample length: 4852 [default0]:Skipping sample id=2741939. Maximum sequence length: 2049, sample length: 5704 [default0]:Skipping sample id=2747337. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2742539. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2744649. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2487515. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2732389. Maximum sequence length: 2049, sample length: 3010 [default0]:Skipping sample id=2732917. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2756528. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2724377. Maximum sequence length: 2049, sample length: 8241 [default0]:Skipping sample id=2733422. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2736511. Maximum sequence length: 2049, sample length: 3263 [default0]:Skipping sample id=2744272. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2738224. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2477772. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2755503. Maximum sequence length: 2049, sample length: 3334 [default0]:Skipping sample id=2733266. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2734109. Maximum sequence length: 2049, sample length: 2737 [default0]:Skipping sample id=2712757. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2718909. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2725875. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2726205. Maximum sequence length: 2049, sample length: 4320 [default0]:Skipping sample id=2739258. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2742791. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2717793. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2716600. Maximum sequence length: 2049, sample length: 2709 [default0]:Skipping sample id=2470249. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2728096. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2716151. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2748117. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2727034. Maximum sequence length: 2049, sample length: 2922 [default0]:Skipping sample id=2748148. Maximum sequence length: 2049, sample length: 3393 [default0]:Skipping sample id=2746372. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2737161. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2721132. Maximum sequence length: 2049, sample length: 3209 [default0]:Skipping sample id=2723986. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2735859. Maximum sequence length: 2049, sample length: 3345 [default0]:Skipping sample id=2727610. Maximum sequence length: 2049, sample length: 5448 [default0]:Skipping sample id=2712367. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2734011. Maximum sequence length: 2049, sample length: 3749 [default0]:Skipping sample id=2728142. Maximum sequence length: 2049, sample length: 2728 [default0]:Skipping sample id=2743155. Maximum sequence length: 2049, sample length: 5174 [default0]:Skipping sample id=2732320. Maximum sequence length: 2049, sample length: 4156 [default0]:Skipping sample id=2722349. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2713141. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2718648. Maximum sequence length: 2049, sample length: 3996 [default0]:Skipping sample id=2720602. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2724161. Maximum sequence length: 2049, sample length: 2808 [default0]:Skipping sample id=2716420. Maximum sequence length: 2049, sample length: 3280 [default0]:Skipping sample id=2741603. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2723287. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2720682. Maximum sequence length: 2049, sample length: 3564 [default0]:Skipping sample id=2719855. Maximum sequence length: 2049, sample length: 4293 [default0]:Skipping sample id=2741921. Maximum sequence length: 2049, sample length: 3827 [default0]:Skipping sample id=2756196. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2498815. Maximum sequence length: 2049, sample length: 2507 [default0]:Skipping sample id=2753698. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2713155. Maximum sequence length: 2049, sample length: 7210 [default0]:Skipping sample id=2747998. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2711815. Maximum sequence length: 2049, sample length: 2601 [default0]:Skipping sample id=2754149. Maximum sequence length: 2049, sample length: 6439 [default0]:Skipping sample id=2467010. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2719694. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2740259. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2737956. Maximum sequence length: 2049, sample length: 2889 [default0]:Skipping sample id=2494039. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2747643. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2716694. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2740113. Maximum sequence length: 2049, sample length: 3564 [default0]:Skipping sample id=2481746. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2746895. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2729247. Maximum sequence length: 2049, sample length: 5363 [default0]:Skipping sample id=2717698. Maximum sequence length: 2049, sample length: 3601 [default0]:Skipping sample id=2744537. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2721263. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2733179. Maximum sequence length: 2049, sample length: 3820 [default0]:Skipping sample id=2747019. Maximum sequence length: 2049, sample length: 3836 [default0]:Skipping sample id=2726289. Maximum sequence length: 2049, sample length: 4275 [default0]:Skipping sample id=2735385. Maximum sequence length: 2049, sample length: 3665 [default0]:Skipping sample id=2722256. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2484344. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2745199. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2756167. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2737794. Maximum sequence length: 2049, sample length: 5204 [default0]:Skipping sample id=2733513. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2487165. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2722604. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2711480. Maximum sequence length: 2049, sample length: 4259 [default0]:Skipping sample id=2713767. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2719371. Maximum sequence length: 2049, sample length: 2969 [default0]:Skipping sample id=2748165. Maximum sequence length: 2049, sample length: 4159 [default0]:Skipping sample id=2713018. Maximum sequence length: 2049, sample length: 2925 [default0]:Skipping sample id=2717348. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2752346. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2746967. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2721384. Maximum sequence length: 2049, sample length: 5945 [default0]:Skipping sample id=2740455. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2733271. Maximum sequence length: 2049, sample length: 2727 [default0]:Skipping sample id=2742760. Maximum sequence length: 2049, sample length: 4936 [default0]:Skipping sample id=2712249. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2741423. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2714061. Maximum sequence length: 2049, sample length: 4036 [default0]:Skipping sample id=2470746. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2748834. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2720425. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2721334. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2742923. Maximum sequence length: 2049, sample length: 5205 [default0]:Skipping sample id=2742389. Maximum sequence length: 2049, sample length: 3823 [default0]:Skipping sample id=2498145. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2467195. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2742697. Maximum sequence length: 2049, sample length: 3948 [default0]:Skipping sample id=2716245. Maximum sequence length: 2049, sample length: 3987 [default0]:Skipping sample id=2732862. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2738796. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2739235. Maximum sequence length: 2049, sample length: 3754 [default0]:Skipping sample id=2480629. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2466464. Maximum sequence length: 2049, sample length: 2826 [default0]:Skipping sample id=2740326. Maximum sequence length: 2049, sample length: 3574 [default0]:Skipping sample id=2755713. Maximum sequence length: 2049, sample length: 3580 [default0]:Skipping sample id=2730972. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2756678. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2721894. Maximum sequence length: 2049, sample length: 3573 [default0]:Skipping sample id=2742088. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2711941. Maximum sequence length: 2049, sample length: 5320 [default0]:Skipping sample id=2716645. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2712888. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2752235. Maximum sequence length: 2049, sample length: 3252 [default0]:Skipping sample id=2742437. Maximum sequence length: 2049, sample length: 3994 [default0]:Skipping sample id=2733243. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2751535. Maximum sequence length: 2049, sample length: 2908 [default0]:Skipping sample id=2724706. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2749074. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2711172. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2469566. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2488305. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2738963. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2727280. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2715591. Maximum sequence length: 2049, sample length: 4330 [default0]:Skipping sample id=2729775. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2481913. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2728701. Maximum sequence length: 2049, sample length: 2623 [default0]:Skipping sample id=2721780. Maximum sequence length: 2049, sample length: 5201 [default0]:Skipping sample id=2722948. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2714814. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2749054. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2749343. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2722444. Maximum sequence length: 2049, sample length: 7112 [default0]:Skipping sample id=2735933. Maximum sequence length: 2049, sample length: 4796 [default0]:Skipping sample id=2729134. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2729646. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2731109. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2489373. Maximum sequence length: 2049, sample length: 3602 [default0]:Skipping sample id=2737753. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2732955. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2741714. Maximum sequence length: 2049, sample length: 4805 [default0]:Skipping sample id=2736101. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2751647. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2730904. Maximum sequence length: 2049, sample length: 5843 [default0]:Skipping sample id=2752077. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2737015. Maximum sequence length: 2049, sample length: 3005 [default0]:Skipping sample id=2717075. Maximum sequence length: 2049, sample length: 4844 [default0]:Skipping sample id=2717722. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2748255. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2726634. Maximum sequence length: 2049, sample length: 4817 [default0]:Skipping sample id=2752314. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2754299. Maximum sequence length: 2049, sample length: 4777 [default0]:Skipping sample id=2753944. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2714324. Maximum sequence length: 2049, sample length: 4186 [default0]:Skipping sample id=2728779. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2722972. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2744803. Maximum sequence length: 2049, sample length: 3092 [default0]:Skipping sample id=2741581. Maximum sequence length: 2049, sample length: 4314 [default0]:Skipping sample id=2712114. Maximum sequence length: 2049, sample length: 5466 [default0]:Skipping sample id=2717434. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2754053. Maximum sequence length: 2049, sample length: 6221 [default0]:Skipping sample id=2753422. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2722075. Maximum sequence length: 2049, sample length: 3916 [default0]:Skipping sample id=2716116. Maximum sequence length: 2049, sample length: 4320 [default0]:Skipping sample id=2744834. Maximum sequence length: 2049, sample length: 2974 [default0]:Skipping sample id=2756181. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2725908. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2738858. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2736344. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2490103. Maximum sequence length: 2049, sample length: 3298 [default0]:Skipping sample id=2755010. Maximum sequence length: 2049, sample length: 5149 [default0]:Skipping sample id=2712950. Maximum sequence length: 2049, sample length: 5439 [default0]:Skipping sample id=2747915. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2721252. Maximum sequence length: 2049, sample length: 3318 [default0]:Skipping sample id=2744332. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2736604. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2716438. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2740194. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2747369. Maximum sequence length: 2049, sample length: 5529 [default0]:Skipping sample id=2753599. Maximum sequence length: 2049, sample length: 3331 [default0]:Skipping sample id=2732436. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2754475. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2736450. Maximum sequence length: 2049, sample length: 5350 [default0]:Skipping sample id=2724519. Maximum sequence length: 2049, sample length: 3053 [default0]:Skipping sample id=2484407. Maximum sequence length: 2049, sample length: 2734 [default0]:Skipping sample id=2735800. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2718878. Maximum sequence length: 2049, sample length: 3591 [default0]:Skipping sample id=2715568. Maximum sequence length: 2049, sample length: 4174 [default0]:Skipping sample id=2718345. Maximum sequence length: 2049, sample length: 3468 [default0]:Skipping sample id=2714971. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2716334. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2752108. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2741347. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2751122. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2737469. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2744311. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2747389. Maximum sequence length: 2049, sample length: 5019 [default0]:Skipping sample id=2746688. Maximum sequence length: 2049, sample length: 3627 [default0]:Skipping sample id=2749497. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2712425. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2712390. Maximum sequence length: 2049, sample length: 2687 [default0]:Skipping sample id=2728562. Maximum sequence length: 2049, sample length: 5665 [default0]:Skipping sample id=2742426. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2735191. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2752321. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2748237. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2737640. Maximum sequence length: 2049, sample length: 3605 [default0]:Skipping sample id=2717095. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2730369. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2744299. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2755026. Maximum sequence length: 2049, sample length: 8477 [default0]:Skipping sample id=2716921. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2726201. Maximum sequence length: 2049, sample length: 6531 [default0]:Skipping sample id=2719618. Maximum sequence length: 2049, sample length: 7617 [default0]:Skipping sample id=2716225. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2750122. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2731857. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2756819. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2718781. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2751637. Maximum sequence length: 2049, sample length: 3720 [default0]:Skipping sample id=2729116. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2724508. Maximum sequence length: 2049, sample length: 2580 [default0]:Skipping sample id=2744559. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2724261. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2729837. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2749429. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2731515. Maximum sequence length: 2049, sample length: 4361 [default0]:Skipping sample id=2478255. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2490799. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2732739. Maximum sequence length: 2049, sample length: 5448 [default0]:Skipping sample id=2748597. Maximum sequence length: 2049, sample length: 4080 [default0]:Skipping sample id=2495370. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2717708. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2733630. Maximum sequence length: 2049, sample length: 2796 [default0]:Skipping sample id=2754438. Maximum sequence length: 2049, sample length: 2849 [default0]:Skipping sample id=2739959. Maximum sequence length: 2049, sample length: 2640 [default0]:Skipping sample id=2711608. Maximum sequence length: 2049, sample length: 3459 [default0]:Skipping sample id=2741568. Maximum sequence length: 2049, sample length: 2950 [default0]:Skipping sample id=2741977. Maximum sequence length: 2049, sample length: 3246 [default0]:Skipping sample id=2731582. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2493549. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2713435. Maximum sequence length: 2049, sample length: 3204 [default0]:Skipping sample id=2756424. Maximum sequence length: 2049, sample length: 3377 [default0]:Skipping sample id=2499372. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2731019. Maximum sequence length: 2049, sample length: 2983 [default0]:Skipping sample id=2725032. Maximum sequence length: 2049, sample length: 3435 [default0]:Skipping sample id=2477031. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2718246. Maximum sequence length: 2049, sample length: 6543 [default0]:Skipping sample id=2746518. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2478646. Maximum sequence length: 2049, sample length: 2468 [default0]:Skipping sample id=2753217. Maximum sequence length: 2049, sample length: 3157 [default0]:Skipping sample id=2484670. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2713405. Maximum sequence length: 2049, sample length: 3571 [default0]:Skipping sample id=2730847. Maximum sequence length: 2049, sample length: 4244 [default0]:Skipping sample id=2714182. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2487867. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2725164. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2733753. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2735308. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2744462. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2748122. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2751857. Maximum sequence length: 2049, sample length: 2822 [default0]:Skipping sample id=2714946. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2729997. Maximum sequence length: 2049, sample length: 3726 [default0]:Skipping sample id=2751061. Maximum sequence length: 2049, sample length: 5964 [default0]:Skipping sample id=2720800. Maximum sequence length: 2049, sample length: 3321 [default0]:Skipping sample id=2739905. Maximum sequence length: 2049, sample length: 3407 [default0]:Skipping sample id=2496906. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2741926. Maximum sequence length: 2049, sample length: 3101 [default0]:Skipping sample id=2727877. Maximum sequence length: 2049, sample length: 3564 [default0]:Skipping sample id=2467009. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2744221. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2721217. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2494023. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2728502. Maximum sequence length: 2049, sample length: 3750 [default0]:Skipping sample id=2715995. Maximum sequence length: 2049, sample length: 3273 [default0]:Skipping sample id=2711438. Maximum sequence length: 2049, sample length: 3382 [default0]:Skipping sample id=2736263. Maximum sequence length: 2049, sample length: 3109 [default0]:Skipping sample id=2725628. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2733448. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2727318. Maximum sequence length: 2049, sample length: 3747 [default0]:Skipping sample id=2737167. Maximum sequence length: 2049, sample length: 5028 [default0]:Skipping sample id=2731363. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2737326. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2727159. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2466136. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2732709. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2723163. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2724358. Maximum sequence length: 2049, sample length: 3894 [default0]:Skipping sample id=2730039. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2719372. Maximum sequence length: 2049, sample length: 5207 [default0]:Skipping sample id=2750379. Maximum sequence length: 2049, sample length: 3639 [default0]:Skipping sample id=2720125. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2717119. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2729285. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2485524. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2721939. Maximum sequence length: 2049, sample length: 2513 [default0]:Skipping sample id=2756109. Maximum sequence length: 2049, sample length: 7562 [default0]:Skipping sample id=2756725. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2494042. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2751941. Maximum sequence length: 2049, sample length: 3501 [default0]:Skipping sample id=2725186. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2742085. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2495180. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2733649. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2721129. Maximum sequence length: 2049, sample length: 3986 [default0]:Skipping sample id=2742331. Maximum sequence length: 2049, sample length: 4014 [default0]:Skipping sample id=2467022. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2715954. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2753570. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2735604. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2743310. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2717489. Maximum sequence length: 2049, sample length: 4977 [default0]:Skipping sample id=2725142. Maximum sequence length: 2049, sample length: 2616 [default0]:Skipping sample id=2750603. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2730165. Maximum sequence length: 2049, sample length: 3244 [default0]:Skipping sample id=2724868. Maximum sequence length: 2049, sample length: 3743 [default0]:Skipping sample id=2497913. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2733890. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2716039. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2743612. Maximum sequence length: 2049, sample length: 3337 [default0]:Skipping sample id=2729320. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2734990. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2491277. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2726037. Maximum sequence length: 2049, sample length: 3313 [default0]:Skipping sample id=2743334. Maximum sequence length: 2049, sample length: 2683 [default0]:Skipping sample id=2743275. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2753161. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2486661. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2711924. Maximum sequence length: 2049, sample length: 3275 [default0]:Skipping sample id=2736444. Maximum sequence length: 2049, sample length: 4557 [default0]:Skipping sample id=2726303. Maximum sequence length: 2049, sample length: 4834 [default0]:Skipping sample id=2737805. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2716051. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2489204. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2735130. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2496140. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2748821. Maximum sequence length: 2049, sample length: 4777 [default0]:Skipping sample id=2736445. Maximum sequence length: 2049, sample length: 4978 [default0]:Skipping sample id=2751956. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2717334. Maximum sequence length: 2049, sample length: 4749 [default0]:Skipping sample id=2713350. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2711913. Maximum sequence length: 2049, sample length: 3025 [default0]:Skipping sample id=2744576. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2749089. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2740402. Maximum sequence length: 2049, sample length: 3717 [default0]:Skipping sample id=2727994. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2722000. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2755624. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2729880. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2749119. Maximum sequence length: 2049, sample length: 3373 [default0]:Skipping sample id=2755595. Maximum sequence length: 2049, sample length: 3053 [default0]:Skipping sample id=2743493. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2725810. Maximum sequence length: 2049, sample length: 2981 [default0]:Skipping sample id=2494846. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2729357. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2741262. Maximum sequence length: 2049, sample length: 2486 [default0]:Skipping sample id=2733164. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2740885. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2751286. Maximum sequence length: 2049, sample length: 3416 [default0]:Skipping sample id=2716535. Maximum sequence length: 2049, sample length: 4052 [default0]:Skipping sample id=2727050. Maximum sequence length: 2049, sample length: 4074 [default0]:Skipping sample id=2493652. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2731475. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2732508. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2733908. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2754392. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2748630. Maximum sequence length: 2049, sample length: 2689 [default0]:Skipping sample id=2750199. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2749882. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2752650. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2725699. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2499200. Maximum sequence length: 2049, sample length: 2741 [default0]:Skipping sample id=2719471. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2744502. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2736383. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2742946. Maximum sequence length: 2049, sample length: 2958 [default0]:Skipping sample id=2470834. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2466461. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2718389. Maximum sequence length: 2049, sample length: 3582 [default0]:Skipping sample id=2743478. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2719839. Maximum sequence length: 2049, sample length: 5522 [default0]:Skipping sample id=2719774. Maximum sequence length: 2049, sample length: 2623 [default0]:Skipping sample id=2730424. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2737613. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2724738. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2740417. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2731636. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2737480. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2733510. Maximum sequence length: 2049, sample length: 4709 [default0]:Skipping sample id=2711998. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2724082. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2732351. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2744075. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2756669. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2755901. Maximum sequence length: 2049, sample length: 3483 [default0]:Skipping sample id=2748072. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2742766. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2749065. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2729437. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2490881. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2716563. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2741074. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2737884. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2734679. Maximum sequence length: 2049, sample length: 2891 [default0]:Skipping sample id=2739018. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2741484. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2732355. Maximum sequence length: 2049, sample length: 2841 [default0]:Skipping sample id=2718324. Maximum sequence length: 2049, sample length: 4628 [default0]:Skipping sample id=2722178. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2724554. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2731157. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2720067. Maximum sequence length: 2049, sample length: 2920 [default0]:Skipping sample id=2737410. Maximum sequence length: 2049, sample length: 2587 [default0]:Skipping sample id=2753228. Maximum sequence length: 2049, sample length: 2980 [default0]:Skipping sample id=2751700. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2720663. Maximum sequence length: 2049, sample length: 3927 [default0]:Skipping sample id=2736593. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2720342. Maximum sequence length: 2049, sample length: 3833 [default0]:Skipping sample id=2754026. Maximum sequence length: 2049, sample length: 4592 [default0]:Skipping sample id=2716609. Maximum sequence length: 2049, sample length: 4259 [default0]:Skipping sample id=2496176. Maximum sequence length: 2049, sample length: 3844 [default0]:Skipping sample id=2734967. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2470036. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2721229. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2719278. Maximum sequence length: 2049, sample length: 3945 [default0]:Skipping sample id=2714013. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2715980. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2738540. Maximum sequence length: 2049, sample length: 2874 [default0]:Skipping sample id=2723208. Maximum sequence length: 2049, sample length: 5446 [default0]:Skipping sample id=2720086. Maximum sequence length: 2049, sample length: 3821 [default0]:Skipping sample id=2746825. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2723659. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2720557. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2722506. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2722683. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2734245. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2755646. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2739324. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2720137. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2713949. Maximum sequence length: 2049, sample length: 6017 [default0]:Skipping sample id=2735976. Maximum sequence length: 2049, sample length: 4244 [default0]:Skipping sample id=2725675. Maximum sequence length: 2049, sample length: 4813 [default0]:Skipping sample id=2467403. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2726903. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2711370. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2722017. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2720264. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2748984. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2723810. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2750545. Maximum sequence length: 2049, sample length: 3630 [default0]:Skipping sample id=2732996. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2714667. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2736703. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2744746. Maximum sequence length: 2049, sample length: 3611 [default0]:Skipping sample id=2743143. Maximum sequence length: 2049, sample length: 3086 [default0]:Skipping sample id=2738528. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2712573. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2467107. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2724856. Maximum sequence length: 2049, sample length: 2874 [default0]:Skipping sample id=2466221. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2724021. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2733548. Maximum sequence length: 2049, sample length: 3520 [default0]:Skipping sample id=2470479. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2495265. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2742538. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2720022. Maximum sequence length: 2049, sample length: 6626 [default0]:Skipping sample id=2716440. Maximum sequence length: 2049, sample length: 2808 [default0]:Skipping sample id=2749964. Maximum sequence length: 2049, sample length: 3784 [default0]:Skipping sample id=2743706. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2744189. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2756554. Maximum sequence length: 2049, sample length: 3948 [default0]:Skipping sample id=2715867. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2736207. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2738839. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2483803. Maximum sequence length: 2049, sample length: 3392 [default0]:Skipping sample id=2728448. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2744057. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2713623. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2738959. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2734455. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2721599. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2721664. Maximum sequence length: 2049, sample length: 4059 [default0]:Skipping sample id=2736273. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2741198. Maximum sequence length: 2049, sample length: 3372 [default0]:Skipping sample id=2753521. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2749731. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2714957. Maximum sequence length: 2049, sample length: 3821 [default0]:Skipping sample id=2717890. Maximum sequence length: 2049, sample length: 3697 [default0]:Skipping sample id=2713189. Maximum sequence length: 2049, sample length: 6491 [default0]:Skipping sample id=2734643. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2755156. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2738802. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2714726. Maximum sequence length: 2049, sample length: 4293 [default0]:Skipping sample id=2745337. Maximum sequence length: 2049, sample length: 3047 [default0]:Skipping sample id=2740640. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2746420. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2736741. Maximum sequence length: 2049, sample length: 3252 [default0]:Skipping sample id=2478253. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2717354. Maximum sequence length: 2049, sample length: 2953 [default0]:Skipping sample id=2739062. Maximum sequence length: 2049, sample length: 2854 [default0]:Skipping sample id=2715757. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2713638. Maximum sequence length: 2049, sample length: 2973 [default0]:Skipping sample id=2714790. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2728114. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2741447. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2711311. Maximum sequence length: 2049, sample length: 2808 [default0]:Skipping sample id=2713814. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2488072. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2729223. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2496421. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2720863. Maximum sequence length: 2049, sample length: 3961 [default0]:Skipping sample id=2752978. Maximum sequence length: 2049, sample length: 4694 [default0]:Skipping sample id=2496868. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2731293. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2482778. Maximum sequence length: 2049, sample length: 2623 [default0]:Skipping sample id=2722845. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2726601. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2732858. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2718244. Maximum sequence length: 2049, sample length: 3760 [default0]:Skipping sample id=2730893. Maximum sequence length: 2049, sample length: 3152 [default0]:Skipping sample id=2751861. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2466375. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2748198. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2754767. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2715898. Maximum sequence length: 2049, sample length: 4539 [default0]:Skipping sample id=2750481. Maximum sequence length: 2049, sample length: 3667 [default0]:Skipping sample id=2726157. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2740694. Maximum sequence length: 2049, sample length: 3655 [default0]:Skipping sample id=2492546. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2732174. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2740723. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2725247. Maximum sequence length: 2049, sample length: 3754 [default0]:Skipping sample id=2743834. Maximum sequence length: 2049, sample length: 3730 [default0]:Skipping sample id=2486642. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2727235. Maximum sequence length: 2049, sample length: 4371 [default0]:Skipping sample id=2755480. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2494710. Maximum sequence length: 2049, sample length: 3195 [default0]:Skipping sample id=2735758. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2721425. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2482795. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2734261. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2496503. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2489886. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2477121. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2731369. Maximum sequence length: 2049, sample length: 2862 [default0]:Skipping sample id=2483784. Maximum sequence length: 2049, sample length: 3636 [default0]:Skipping sample id=2727069. Maximum sequence length: 2049, sample length: 3724 [default0]:Skipping sample id=2755512. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2713115. Maximum sequence length: 2049, sample length: 3760 [default0]:Skipping sample id=2754735. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2747234. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2736160. Maximum sequence length: 2049, sample length: 3305 [default0]:Skipping sample id=2749574. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2714476. Maximum sequence length: 2049, sample length: 3856 [default0]:Skipping sample id=2712435. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2488424. Maximum sequence length: 2049, sample length: 3591 [default0]:Skipping sample id=2732873. Maximum sequence length: 2049, sample length: 3226 [default0]:Skipping sample id=2713054. Maximum sequence length: 2049, sample length: 3298 [default0]:Skipping sample id=2483786. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2482275. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2729876. Maximum sequence length: 2049, sample length: 7607 [default0]:Skipping sample id=2712652. Maximum sequence length: 2049, sample length: 4802 [default0]:Skipping sample id=2498168. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2749376. Maximum sequence length: 2049, sample length: 3273 [default0]:Skipping sample id=2739443. Maximum sequence length: 2049, sample length: 2990 [default0]:Skipping sample id=2477255. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2742694. Maximum sequence length: 2049, sample length: 2513 [default0]:Skipping sample id=2750016. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2711961. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2495477. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2731280. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2742334. Maximum sequence length: 2049, sample length: 4837 [default0]:Skipping sample id=2480723. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2749806. Maximum sequence length: 2049, sample length: 2589 [default0]:Skipping sample id=2716003. Maximum sequence length: 2049, sample length: 3209 [default0]:Skipping sample id=2751510. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2484316. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2726953. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2746516. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2751863. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2712480. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2715413. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2746558. Maximum sequence length: 2049, sample length: 4310 [default0]:Skipping sample id=2739900. Maximum sequence length: 2049, sample length: 4087 [default0]:Skipping sample id=2723290. Maximum sequence length: 2049, sample length: 3538 [default0]:Skipping sample id=2717532. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2749116. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2741316. Maximum sequence length: 2049, sample length: 3222 [default0]:Skipping sample id=2727845. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2724640. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2747193. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2493578. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2487003. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2734681. Maximum sequence length: 2049, sample length: 5312 [default0]:Skipping sample id=2713303. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2751313. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2716549. Maximum sequence length: 2049, sample length: 3416 [default0]:Skipping sample id=2741292. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2723871. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2714549. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2711714. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2480097. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2712175. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2752830. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2726623. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2746738. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2725519. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2750221. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2467623. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2478939. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2726612. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2720541. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2726658. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2749953. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2734890. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2726851. Maximum sequence length: 2049, sample length: 2886 [default0]:Skipping sample id=2716969. Maximum sequence length: 2049, sample length: 3006 [default0]:Skipping sample id=2720534. Maximum sequence length: 2049, sample length: 2869 [default0]:Skipping sample id=2712408. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2749606. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2737734. Maximum sequence length: 2049, sample length: 2400 [default0]:Skipping sample id=2468040. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2718893. Maximum sequence length: 2049, sample length: 5766 [default0]:Skipping sample id=2727546. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2734695. Maximum sequence length: 2049, sample length: 3410 [default0]:Skipping sample id=2718751. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2751287. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2742827. Maximum sequence length: 2049, sample length: 3022 [default0]:Skipping sample id=2738359. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2721569. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2721360. Maximum sequence length: 2049, sample length: 2120 [default0]:Skipping sample id=2742875. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2734996. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2715734. Maximum sequence length: 2049, sample length: 3961 [default0]:Skipping sample id=2731273. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2723951. Maximum sequence length: 2049, sample length: 4988 [default0]:Skipping sample id=2749541. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2715551. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2753129. Maximum sequence length: 2049, sample length: 3936 [default0]:Skipping sample id=2748474. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2739768. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2495729. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2721876. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2493694. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2754207. Maximum sequence length: 2049, sample length: 5541 [default0]:Skipping sample id=2719563. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2479604. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2748152. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2746035. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2755644. Maximum sequence length: 2049, sample length: 3301 [default0]:Skipping sample id=2714631. Maximum sequence length: 2049, sample length: 4084 [default0]:Skipping sample id=2756584. Maximum sequence length: 2049, sample length: 2750 [default0]:Skipping sample id=2750146. Maximum sequence length: 2049, sample length: 4958 [default0]:Skipping sample id=2495676. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2485129. Maximum sequence length: 2049, sample length: 3549 [default0]:Skipping sample id=2725302. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2723926. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2749183. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2716934. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2754838. Maximum sequence length: 2049, sample length: 3056 [default0]:Skipping sample id=2750955. Maximum sequence length: 2049, sample length: 3509 [default0]:Skipping sample id=2712439. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2713527. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2489330. Maximum sequence length: 2049, sample length: 3885 [default0]:Skipping sample id=2742925. Maximum sequence length: 2049, sample length: 4952 [default0]:Skipping sample id=2723228. Maximum sequence length: 2049, sample length: 4945 [default0]:Skipping sample id=2747795. Maximum sequence length: 2049, sample length: 2878 [default0]:Skipping sample id=2745680. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2743422. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2715671. Maximum sequence length: 2049, sample length: 3321 [default0]:Skipping sample id=2731752. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2747509. Maximum sequence length: 2049, sample length: 3253 [default0]:Skipping sample id=2724710. Maximum sequence length: 2049, sample length: 6645 [default0]:Skipping sample id=2497896. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2727455. Maximum sequence length: 2049, sample length: 4038 [default0]:Skipping sample id=2746604. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2716754. Maximum sequence length: 2049, sample length: 6060 [default0]:Skipping sample id=2748924. Maximum sequence length: 2049, sample length: 6222 [default0]:Skipping sample id=2756675. Maximum sequence length: 2049, sample length: 3268 [default0]:Skipping sample id=2749221. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2756754. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2756239. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2722684. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2740043. Maximum sequence length: 2049, sample length: 3811 [default0]:Skipping sample id=2738246. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2733254. Maximum sequence length: 2049, sample length: 3063 [default0]:Skipping sample id=2488499. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2726774. Maximum sequence length: 2049, sample length: 2593 [default0]:Skipping sample id=2726882. Maximum sequence length: 2049, sample length: 4094 [default0]:Skipping sample id=2742598. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2490994. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2486251. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2733225. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2725048. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2734454. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2746878. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2733313. Maximum sequence length: 2049, sample length: 4897 [default0]:Skipping sample id=2724880. Maximum sequence length: 2049, sample length: 4540 [default0]:Skipping sample id=2721245. Maximum sequence length: 2049, sample length: 4327 [default0]:Skipping sample id=2754653. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2714788. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2470666. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2714756. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2717736. Maximum sequence length: 2049, sample length: 2706 [default0]:Skipping sample id=2720995. Maximum sequence length: 2049, sample length: 3212 [default0]:Skipping sample id=2736163. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2738692. Maximum sequence length: 2049, sample length: 4106 [default0]:Skipping sample id=2730770. Maximum sequence length: 2049, sample length: 3585 [default0]:Skipping sample id=2711827. Maximum sequence length: 2049, sample length: 3485 [default0]:Skipping sample id=2467414. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2756190. Maximum sequence length: 2049, sample length: 5964 [default0]:Skipping sample id=2726318. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2721997. Maximum sequence length: 2049, sample length: 3437 [default0]:Skipping sample id=2711578. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2740433. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2484231. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2751572. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2729318. Maximum sequence length: 2049, sample length: 3508 [default0]:Skipping sample id=2495812. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2752600. Maximum sequence length: 2049, sample length: 3601 [default0]:Skipping sample id=2744638. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2731386. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2737114. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2740814. Maximum sequence length: 2049, sample length: 3492 [default0]:Skipping sample id=2739046. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2756540. Maximum sequence length: 2049, sample length: 3040 [default0]:Skipping sample id=2731888. Maximum sequence length: 2049, sample length: 4893 [default0]:Skipping sample id=2722941. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2750378. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2734942. Maximum sequence length: 2049, sample length: 3521 [default0]:Skipping sample id=2722697. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2736142. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2720181. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2479191. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2727298. Maximum sequence length: 2049, sample length: 4704 [default0]:Skipping sample id=2711723. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2745664. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2716092. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2735142. Maximum sequence length: 2049, sample length: 3178 [default0]:Skipping sample id=2495326. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2471207. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2725131. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2748558. Maximum sequence length: 2049, sample length: 2669 [default0]:Skipping sample id=2719500. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2717460. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2737735. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2744233. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2470913. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2736685. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2722259. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2731184. Maximum sequence length: 2049, sample length: 3707 [default0]:Skipping sample id=2756776. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2466546. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2498674. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2738125. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2731212. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2740824. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2741526. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2727867. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2754799. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2734039. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2487244. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2717017. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2744519. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2733757. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2714598. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2716686. Maximum sequence length: 2049, sample length: 4112 [default0]:Skipping sample id=2752845. Maximum sequence length: 2049, sample length: 7108 [default0]:Skipping sample id=2730864. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2727495. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2739636. Maximum sequence length: 2049, sample length: 3346 [default0]:Skipping sample id=2726936. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2468854. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2746134. Maximum sequence length: 2049, sample length: 2584 [default0]:Skipping sample id=2731787. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2494895. Maximum sequence length: 2049, sample length: 4090 [default0]:Skipping sample id=2729335. Maximum sequence length: 2049, sample length: 3858 [default0]:Skipping sample id=2724591. Maximum sequence length: 2049, sample length: 3343 [default0]:Skipping sample id=2723571. Maximum sequence length: 2049, sample length: 3739 [default0]:Skipping sample id=2719146. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2488979. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2750735. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2715316. Maximum sequence length: 2049, sample length: 4182 [default0]:Skipping sample id=2734858. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2723318. Maximum sequence length: 2049, sample length: 3839 [default0]:Skipping sample id=2482066. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2488711. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2726141. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2715609. Maximum sequence length: 2049, sample length: 3913 [default0]:Skipping sample id=2480504. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2721087. Maximum sequence length: 2049, sample length: 3083 [default0]:Skipping sample id=2731313. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2725955. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2721630. Maximum sequence length: 2049, sample length: 4910 [default0]:Skipping sample id=2726394. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2484442. Maximum sequence length: 2049, sample length: 3517 [default0]:Skipping sample id=2735413. Maximum sequence length: 2049, sample length: 4145 [default0]:Skipping sample id=2737643. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2477590. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2730999. Maximum sequence length: 2049, sample length: 2986 [default0]:Skipping sample id=2711319. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2719039. Maximum sequence length: 2049, sample length: 4091 [default0]:Skipping sample id=2714996. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2730382. Maximum sequence length: 2049, sample length: 2865 [default0]:Skipping sample id=2728504. Maximum sequence length: 2049, sample length: 6533 [default0]:Skipping sample id=2743512. Maximum sequence length: 2049, sample length: 2991 [default0]:Skipping sample id=2747485. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2717153. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2754878. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2739005. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2715432. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2736501. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2498490. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2733017. Maximum sequence length: 2049, sample length: 2894 [default0]:Skipping sample id=2721686. Maximum sequence length: 2049, sample length: 3296 [default0]:Skipping sample id=2731387. Maximum sequence length: 2049, sample length: 3976 [default0]:Skipping sample id=2723777. Maximum sequence length: 2049, sample length: 4003 [default0]:Skipping sample id=2718364. Maximum sequence length: 2049, sample length: 4441 [default0]:Skipping sample id=2744703. Maximum sequence length: 2049, sample length: 2995 [default0]:Skipping sample id=2733090. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2753366. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2753022. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2719501. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2741893. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2494622. Maximum sequence length: 2049, sample length: 3161 [default0]:Skipping sample id=2485279. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2712668. Maximum sequence length: 2049, sample length: 3715 [default0]:Skipping sample id=2718876. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2735271. Maximum sequence length: 2049, sample length: 2904 [default0]:Skipping sample id=2720483. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2754214. Maximum sequence length: 2049, sample length: 3844 [default0]:Skipping sample id=2720776. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2711762. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2715480. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2746676. Maximum sequence length: 2049, sample length: 3333 [default0]:Skipping sample id=2720696. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2729760. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2479343. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2716916. Maximum sequence length: 2049, sample length: 6934 [default0]:Skipping sample id=2724061. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2750328. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2726302. Maximum sequence length: 2049, sample length: 3580 [default0]:Skipping sample id=2744588. Maximum sequence length: 2049, sample length: 2624 [default0]:Skipping sample id=2739268. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2736072. Maximum sequence length: 2049, sample length: 3429 [default0]:Skipping sample id=2485172. Maximum sequence length: 2049, sample length: 3548 [default0]:Skipping sample id=2486933. Maximum sequence length: 2049, sample length: 3675 [default0]:Skipping sample id=2751829. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2741536. Maximum sequence length: 2049, sample length: 3624 [default0]:Skipping sample id=2720851. Maximum sequence length: 2049, sample length: 2558 [default0]:Skipping sample id=2753281. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2731932. Maximum sequence length: 2049, sample length: 3775 [default0]:Skipping sample id=2498116. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2750898. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2740412. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2751067. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2714774. Maximum sequence length: 2049, sample length: 3428 [default0]:Skipping sample id=2466802. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2740131. Maximum sequence length: 2049, sample length: 2598 [default0]:Skipping sample id=2746400. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2746408. Maximum sequence length: 2049, sample length: 4120 [default0]:Skipping sample id=2727102. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2715044. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2711070. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2477793. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2746649. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2713970. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2726243. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2494128. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2717374. Maximum sequence length: 2049, sample length: 3476 [default0]:Skipping sample id=2717496. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2740605. Maximum sequence length: 2049, sample length: 3482 [default0]:Skipping sample id=2739504. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2751489. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2731023. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2742340. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2738107. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2723116. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2725307. Maximum sequence length: 2049, sample length: 3842 [default0]:Skipping sample id=2718015. Maximum sequence length: 2049, sample length: 3746 [default0]:Skipping sample id=2716365. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2720699. Maximum sequence length: 2049, sample length: 4024 [default0]:Skipping sample id=2754079. Maximum sequence length: 2049, sample length: 4112 [default0]:Skipping sample id=2735627. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2712885. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2752206. Maximum sequence length: 2049, sample length: 3524 [default0]:Skipping sample id=2748614. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2755570. Maximum sequence length: 2049, sample length: 4598 [default0]:Skipping sample id=2756303. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2740916. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2735374. Maximum sequence length: 2049, sample length: 3088 [default0]:Skipping sample id=2753317. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2736711. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2725372. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2742971. Maximum sequence length: 2049, sample length: 3927 [default0]:Skipping sample id=2736476. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2749253. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2726728. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2720105. Maximum sequence length: 2049, sample length: 4635 [default0]:Skipping sample id=2719093. Maximum sequence length: 2049, sample length: 2589 [default0]:Skipping sample id=2468709. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2726299. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2719256. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2734775. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2714661. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2478863. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2490445. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2716760. Maximum sequence length: 2049, sample length: 4188 [default0]:Skipping sample id=2720345. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2496187. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2736083. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2755981. Maximum sequence length: 2049, sample length: 2938 [default0]:Skipping sample id=2734529. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2493824. Maximum sequence length: 2049, sample length: 3387 [default0]:Skipping sample id=2752792. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2484222. Maximum sequence length: 2049, sample length: 3621 [default0]:Skipping sample id=2717099. Maximum sequence length: 2049, sample length: 3999 [default0]:Skipping sample id=2738068. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2469652. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2492805. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2728510. Maximum sequence length: 2049, sample length: 3445 [default0]:Skipping sample id=2721830. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2490794. Maximum sequence length: 2049, sample length: 3539 [default0]:Skipping sample id=2730737. Maximum sequence length: 2049, sample length: 4522 [default0]:Skipping sample id=2742861. Maximum sequence length: 2049, sample length: 4216 [default0]:Skipping sample id=2751827. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2721581. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2735174. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2756572. Maximum sequence length: 2049, sample length: 3896 [default0]:Skipping sample id=2721163. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2742480. Maximum sequence length: 2049, sample length: 3161 [default0]:Skipping sample id=2742397. Maximum sequence length: 2049, sample length: 3317 [default0]:Skipping sample id=2711522. Maximum sequence length: 2049, sample length: 5194 [default0]:Skipping sample id=2728706. Maximum sequence length: 2049, sample length: 2120 [default0]:Skipping sample id=2746382. Maximum sequence length: 2049, sample length: 3358 [default0]:Skipping sample id=2717689. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2487584. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2728067. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2748015. Maximum sequence length: 2049, sample length: 4908 [default0]:Skipping sample id=2723421. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2480694. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2745500. Maximum sequence length: 2049, sample length: 4225 [default0]:Skipping sample id=2716593. Maximum sequence length: 2049, sample length: 2896 [default0]:Skipping sample id=2468015. Maximum sequence length: 2049, sample length: 2839 [default0]:Skipping sample id=2737959. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2714294. Maximum sequence length: 2049, sample length: 4329 [default0]:Skipping sample id=2729906. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2718354. Maximum sequence length: 2049, sample length: 4958 [default0]:Skipping sample id=2756115. Maximum sequence length: 2049, sample length: 2489 [default0]:Skipping sample id=2713836. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2725561. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2752030. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2714520. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2715343. Maximum sequence length: 2049, sample length: 3355 [default0]:Skipping sample id=2729968. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2727756. Maximum sequence length: 2049, sample length: 3201 [default0]:Skipping sample id=2736554. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2722119. Maximum sequence length: 2049, sample length: 3167 [default0]:Skipping sample id=2711114. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2754130. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2480084. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2747574. Maximum sequence length: 2049, sample length: 2949 [default0]:Skipping sample id=2754761. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2737971. Maximum sequence length: 2049, sample length: 3128 [default0]:Skipping sample id=2487219. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2729910. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2732747. Maximum sequence length: 2049, sample length: 5507 [default0]:Skipping sample id=2736488. Maximum sequence length: 2049, sample length: 2753 [default0]:Skipping sample id=2713680. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2744937. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2749946. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2748179. Maximum sequence length: 2049, sample length: 2788 [default0]:Skipping sample id=2749834. Maximum sequence length: 2049, sample length: 3472 [default0]:Skipping sample id=2744715. Maximum sequence length: 2049, sample length: 3524 [default0]:Skipping sample id=2755484. Maximum sequence length: 2049, sample length: 6423 [default0]:Skipping sample id=2740208. Maximum sequence length: 2049, sample length: 3047 [default0]:Skipping sample id=2726272. Maximum sequence length: 2049, sample length: 3953 [default0]:Skipping sample id=2715626. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2718922. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2750685. Maximum sequence length: 2049, sample length: 3513 [default0]:Skipping sample id=2730978. Maximum sequence length: 2049, sample length: 4320 [default0]:Skipping sample id=2741665. Maximum sequence length: 2049, sample length: 4231 [default0]:Skipping sample id=2753768. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2716376. Maximum sequence length: 2049, sample length: 4827 [default0]:Skipping sample id=2478669. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2747720. Maximum sequence length: 2049, sample length: 3752 [default0]:Skipping sample id=2478609. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2724646. Maximum sequence length: 2049, sample length: 3158 [default0]:Skipping sample id=2747747. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2716624. Maximum sequence length: 2049, sample length: 4104 [default0]:Skipping sample id=2725310. Maximum sequence length: 2049, sample length: 3605 [default0]:Skipping sample id=2748101. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2485062. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2755925. Maximum sequence length: 2049, sample length: 6451 [default0]:Skipping sample id=2730556. Maximum sequence length: 2049, sample length: 3612 [default0]:Skipping sample id=2721675. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2486722. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2751817. Maximum sequence length: 2049, sample length: 3634 [default0]:Skipping sample id=2744791. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2742266. Maximum sequence length: 2049, sample length: 3959 [default0]:Skipping sample id=2496734. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2467368. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2754118. Maximum sequence length: 2049, sample length: 3537 [default0]:Skipping sample id=2485434. Maximum sequence length: 2049, sample length: 3420 [default0]:Skipping sample id=2741534. Maximum sequence length: 2049, sample length: 3747 [default0]:Skipping sample id=2743108. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2713875. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2753823. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2496258. Maximum sequence length: 2049, sample length: 2829 [default0]:Skipping sample id=2711803. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2738449. Maximum sequence length: 2049, sample length: 3134 [default0]:Skipping sample id=2713200. Maximum sequence length: 2049, sample length: 3065 [default0]:Skipping sample id=2724236. Maximum sequence length: 2049, sample length: 5430 [default0]:Skipping sample id=2712726. Maximum sequence length: 2049, sample length: 6228 [default0]:Skipping sample id=2745410. Maximum sequence length: 2049, sample length: 3673 [default0]:Skipping sample id=2749138. Maximum sequence length: 2049, sample length: 4193 [default0]:Skipping sample id=2740522. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2743678. Maximum sequence length: 2049, sample length: 3952 [default0]:Skipping sample id=2719547. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2495123. Maximum sequence length: 2049, sample length: 3591 [default0]:Skipping sample id=2746957. Maximum sequence length: 2049, sample length: 3082 [default0]:Skipping sample id=2722005. Maximum sequence length: 2049, sample length: 2740 [default0]:Skipping sample id=2719903. Maximum sequence length: 2049, sample length: 3428 [default0]:Skipping sample id=2711227. Maximum sequence length: 2049, sample length: 3840 [default0]:Skipping sample id=2493778. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2736658. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2736510. Maximum sequence length: 2049, sample length: 3567 [default0]:Skipping sample id=2727777. Maximum sequence length: 2049, sample length: 2677 [default0]:Skipping sample id=2494855. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2713826. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2718627. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2756934. Maximum sequence length: 2049, sample length: 3339 [default0]:Skipping sample id=2752703. Maximum sequence length: 2049, sample length: 2949 [default0]:Skipping sample id=2734219. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2716452. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2731519. Maximum sequence length: 2049, sample length: 3793 [default0]:Skipping sample id=2722721. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2726247. Maximum sequence length: 2049, sample length: 2898 [default0]:Skipping sample id=2754024. Maximum sequence length: 2049, sample length: 5151 [default0]:Skipping sample id=2729981. Maximum sequence length: 2049, sample length: 3109 [default0]:Skipping sample id=2726292. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2715158. Maximum sequence length: 2049, sample length: 5984 [default0]:Skipping sample id=2741564. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2754421. Maximum sequence length: 2049, sample length: 3384 [default0]:Skipping sample id=2478742. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2747461. Maximum sequence length: 2049, sample length: 4547 [default0]:Skipping sample id=2712239. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2743264. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2744180. Maximum sequence length: 2049, sample length: 2844 [default0]:Skipping sample id=2756668. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2716786. Maximum sequence length: 2049, sample length: 2886 [default0]:Skipping sample id=2728802. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2744598. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2484816. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2749128. Maximum sequence length: 2049, sample length: 4795 [default0]:Skipping sample id=2750879. Maximum sequence length: 2049, sample length: 4474 [default0]:Skipping sample id=2743949. Maximum sequence length: 2049, sample length: 5536 [default0]:Skipping sample id=2731189. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2732054. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2748082. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2731174. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2731364. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2728529. Maximum sequence length: 2049, sample length: 3380 [default0]:Skipping sample id=2481190. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2748333. Maximum sequence length: 2049, sample length: 2818 [default0]:Skipping sample id=2745484. Maximum sequence length: 2049, sample length: 5312 [default0]:Skipping sample id=2741257. Maximum sequence length: 2049, sample length: 2865 [default0]:Skipping sample id=2752072. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2741432. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2716482. Maximum sequence length: 2049, sample length: 3868 [default0]:Skipping sample id=2749085. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2755049. Maximum sequence length: 2049, sample length: 3222 [default0]:Skipping sample id=2757065. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2751145. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2745756. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2751605. Maximum sequence length: 2049, sample length: 3373 [default0]:Skipping sample id=2721028. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2740226. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2736730. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2727012. Maximum sequence length: 2049, sample length: 6050 [default0]:Skipping sample id=2753049. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2724043. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2755152. Maximum sequence length: 2049, sample length: 3244 [default0]:Skipping sample id=2728318. Maximum sequence length: 2049, sample length: 2783 [default0]:Skipping sample id=2722587. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2728113. Maximum sequence length: 2049, sample length: 6218 [default0]:Skipping sample id=2468207. Maximum sequence length: 2049, sample length: 4327 [default0]:Skipping sample id=2720020. Maximum sequence length: 2049, sample length: 4779 [default0]:Skipping sample id=2481893. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2746950. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2735774. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2745426. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2734264. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2745438. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2742617. Maximum sequence length: 2049, sample length: 3686 [default0]:Skipping sample id=2711848. Maximum sequence length: 2049, sample length: 4381 [default0]:Skipping sample id=2481298. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2721124. Maximum sequence length: 2049, sample length: 3273 [default0]:Skipping sample id=2741279. Maximum sequence length: 2049, sample length: 4120 [default0]:Skipping sample id=2712403. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2731772. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2738666. Maximum sequence length: 2049, sample length: 6215 [default0]:Skipping sample id=2729476. Maximum sequence length: 2049, sample length: 5807 [default0]:Skipping sample id=2740252. Maximum sequence length: 2049, sample length: 4731 [default0]:Skipping sample id=2755552. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2467096. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2486920. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2729400. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2711147. Maximum sequence length: 2049, sample length: 3931 [default0]:Skipping sample id=2746381. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2753320. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2729873. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2483762. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2485727. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2713119. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2497371. Maximum sequence length: 2049, sample length: 2430 [default0]:Skipping sample id=2465724. Maximum sequence length: 2049, sample length: 2565 [default0]:Skipping sample id=2713417. Maximum sequence length: 2049, sample length: 2903 [default0]:Skipping sample id=2736929. Maximum sequence length: 2049, sample length: 4907 [default0]:Skipping sample id=2736036. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2478924. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2732076. Maximum sequence length: 2049, sample length: 4958 [default0]:Skipping sample id=2745561. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2726617. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2494357. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2718257. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2715244. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2734687. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2481918. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2478339. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2736395. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2723577. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2733712. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2729781. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2723474. Maximum sequence length: 2049, sample length: 3475 [default0]:Skipping sample id=2715080. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2741547. Maximum sequence length: 2049, sample length: 3332 [default0]:Skipping sample id=2748510. Maximum sequence length: 2049, sample length: 2602 [default0]:Skipping sample id=2725623. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2745070. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2749692. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2756069. Maximum sequence length: 2049, sample length: 2864 [default0]:Skipping sample id=2722833. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2732499. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2724104. Maximum sequence length: 2049, sample length: 3128 [default0]:Skipping sample id=2736437. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2747707. Maximum sequence length: 2049, sample length: 4043 [default0]:Skipping sample id=2738576. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2747539. Maximum sequence length: 2049, sample length: 3381 [default0]:Skipping sample id=2733545. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2733159. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2746258. Maximum sequence length: 2049, sample length: 5265 [default0]:Skipping sample id=2720803. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2721118. Maximum sequence length: 2049, sample length: 2891 [default0]:Skipping sample id=2751673. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2488986. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2756711. Maximum sequence length: 2049, sample length: 2986 [default0]:Skipping sample id=2720539. Maximum sequence length: 2049, sample length: 2968 [default0]:Skipping sample id=2713878. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2738782. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2737875. Maximum sequence length: 2049, sample length: 2639 [default0]:Skipping sample id=2726323. Maximum sequence length: 2049, sample length: 5995 [default0]:Skipping sample id=2726837. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2745706. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2495109. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2757014. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2753508. Maximum sequence length: 2049, sample length: 4860 [default0]:Skipping sample id=2721333. Maximum sequence length: 2049, sample length: 4789 [default0]:Skipping sample id=2713506. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2489186. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2713201. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2495651. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2742456. Maximum sequence length: 2049, sample length: 3465 [default0]:Skipping sample id=2738503. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2736596. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2711372. Maximum sequence length: 2049, sample length: 4903 [default0]:Skipping sample id=2743497. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2490168. Maximum sequence length: 2049, sample length: 3402 [default0]:Skipping sample id=2715201. Maximum sequence length: 2049, sample length: 4184 [default0]:Skipping sample id=2711829. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2748050. Maximum sequence length: 2049, sample length: 3435 [default0]:Skipping sample id=2724271. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2716259. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2724065. Maximum sequence length: 2049, sample length: 4301 [default0]:Skipping sample id=2712288. Maximum sequence length: 2049, sample length: 3034 [default0]:Skipping sample id=2735126. Maximum sequence length: 2049, sample length: 3726 [default0]:Skipping sample id=2721222. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2714963. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2482626. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2728982. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2743050. Maximum sequence length: 2049, sample length: 2781 [default0]:Skipping sample id=2713022. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2727332. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2725809. Maximum sequence length: 2049, sample length: 2794 [default0]:Skipping sample id=2734457. Maximum sequence length: 2049, sample length: 3716 [default0]:Skipping sample id=2750302. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2727747. Maximum sequence length: 2049, sample length: 4967 [default0]:Skipping sample id=2741768. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2736125. Maximum sequence length: 2049, sample length: 3902 [default0]:Skipping sample id=2492438. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2746613. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2731034. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2718635. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2484711. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2735476. Maximum sequence length: 2049, sample length: 3624 [default0]:Skipping sample id=2756691. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2723699. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2741475. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2483874. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2739824. Maximum sequence length: 2049, sample length: 3356 [default0]:Skipping sample id=2733776. Maximum sequence length: 2049, sample length: 3571 [default0]:Skipping sample id=2712991. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2724517. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2495466. Maximum sequence length: 2049, sample length: 3618 [default0]:Skipping sample id=2718304. Maximum sequence length: 2049, sample length: 5675 [default0]:Skipping sample id=2733697. Maximum sequence length: 2049, sample length: 4406 [default0]:Skipping sample id=2751909. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2731800. Maximum sequence length: 2049, sample length: 5096 [default0]:Skipping sample id=2496462. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2754816. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2718626. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2746746. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2747819. Maximum sequence length: 2049, sample length: 3010 [default0]:Skipping sample id=2720779. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2485594. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2723102. Maximum sequence length: 2049, sample length: 3015 [default0]:Skipping sample id=2753352. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2485490. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2712381. Maximum sequence length: 2049, sample length: 5332 [default0]:Skipping sample id=2744805. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2711498. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2732477. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2740065. Maximum sequence length: 2049, sample length: 2822 [default0]:Skipping sample id=2742068. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2756599. Maximum sequence length: 2049, sample length: 2965 [default0]:Skipping sample id=2490329. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2749061. Maximum sequence length: 2049, sample length: 2860 [default0]:Skipping sample id=2750414. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2747691. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2720523. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2711419. Maximum sequence length: 2049, sample length: 6684 [default0]:Skipping sample id=2751672. Maximum sequence length: 2049, sample length: 4104 [default0]:Skipping sample id=2736841. Maximum sequence length: 2049, sample length: 2562 [default0]:Skipping sample id=2711362. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2492188. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2751952. Maximum sequence length: 2049, sample length: 2854 [default0]:Skipping sample id=2717276. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2735414. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2752598. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2724952. Maximum sequence length: 2049, sample length: 3310 [default0]:Skipping sample id=2734074. Maximum sequence length: 2049, sample length: 2944 [default0]:Skipping sample id=2495102. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2743693. Maximum sequence length: 2049, sample length: 3642 [default0]:Skipping sample id=2755185. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2725336. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2754973. Maximum sequence length: 2049, sample length: 3529 [default0]:Skipping sample id=2735812. Maximum sequence length: 2049, sample length: 2966 [default0]:Skipping sample id=2748661. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2470300. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2485637. Maximum sequence length: 2049, sample length: 2584 [default0]:Skipping sample id=2751879. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2467196. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2728176. Maximum sequence length: 2049, sample length: 4075 [default0]:Skipping sample id=2740255. Maximum sequence length: 2049, sample length: 3536 [default0]:Skipping sample id=2726668. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2723916. Maximum sequence length: 2049, sample length: 3504 [default0]:Skipping sample id=2726002. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2738035. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2493052. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2748398. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2742011. Maximum sequence length: 2049, sample length: 3153 [default0]:Skipping sample id=2751263. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2494740. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2756562. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2725651. Maximum sequence length: 2049, sample length: 2841 [default0]:Skipping sample id=2488256. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2743237. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2753459. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2756621. Maximum sequence length: 2049, sample length: 4530 [default0]:Skipping sample id=2752452. Maximum sequence length: 2049, sample length: 3313 [default0]:Skipping sample id=2481837. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2736348. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2721525. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2737010. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2468759. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2741620. Maximum sequence length: 2049, sample length: 2587 [default0]:Skipping sample id=2731499. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2740554. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2725667. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2714298. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2756068. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2714090. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2741231. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2734283. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2490892. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2730319. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2733717. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2732011. Maximum sequence length: 2049, sample length: 4093 [default0]:Skipping sample id=2716876. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2488589. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2711294. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2728452. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2719386. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2756374. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2732478. Maximum sequence length: 2049, sample length: 3408 [default0]:Skipping sample id=2729995. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2714131. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2756766. Maximum sequence length: 2049, sample length: 3717 [default0]:Skipping sample id=2713377. Maximum sequence length: 2049, sample length: 4000 [default0]:Skipping sample id=2720993. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2727200. Maximum sequence length: 2049, sample length: 7273 [default0]:Skipping sample id=2750761. Maximum sequence length: 2049, sample length: 2925 [default0]:Skipping sample id=2733413. Maximum sequence length: 2049, sample length: 4238 [default0]:Skipping sample id=2743744. Maximum sequence length: 2049, sample length: 4820 [default0]:Skipping sample id=2754521. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2744733. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2755103. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2719826. Maximum sequence length: 2049, sample length: 4570 [default0]:Skipping sample id=2749346. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2725346. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2741769. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2748516. Maximum sequence length: 2049, sample length: 4349 [default0]:Skipping sample id=2480462. Maximum sequence length: 2049, sample length: 2455 [default0]:Skipping sample id=2735183. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2750389. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2711495. Maximum sequence length: 2049, sample length: 4104 [default0]:Skipping sample id=2755879. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2715979. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2490591. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2750323. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2715864. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2492521. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2714317. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2469857. Maximum sequence length: 2049, sample length: 2457 [default0]:Skipping sample id=2726583. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2743299. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2477555. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2726886. Maximum sequence length: 2049, sample length: 3283 [default0]:Skipping sample id=2477712. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2723113. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2734063. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2723177. Maximum sequence length: 2049, sample length: 3837 [default0]:Skipping sample id=2742033. Maximum sequence length: 2049, sample length: 4357 [default0]:Skipping sample id=2747560. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2754860. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2481699. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2746953. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2720542. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2719954. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2732064. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2735155. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2744581. Maximum sequence length: 2049, sample length: 4062 [default0]:Skipping sample id=2724347. Maximum sequence length: 2049, sample length: 5824 [default0]:Skipping sample id=2720332. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2469261. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2729090. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2734294. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2749026. Maximum sequence length: 2049, sample length: 3303 [default0]:Skipping sample id=2724888. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2724348. Maximum sequence length: 2049, sample length: 3561 [default0]:Skipping sample id=2717181. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2737316. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2469422. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2749704. Maximum sequence length: 2049, sample length: 3323 [default0]:Skipping sample id=2725929. Maximum sequence length: 2049, sample length: 3328 [default0]:Skipping sample id=2719905. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2721286. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2498694. Maximum sequence length: 2049, sample length: 2597 [default0]:Skipping sample id=2755145. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2733584. Maximum sequence length: 2049, sample length: 3980 [default0]:Skipping sample id=2734535. Maximum sequence length: 2049, sample length: 3960 [default0]:Skipping sample id=2713049. Maximum sequence length: 2049, sample length: 3942 [default0]:Skipping sample id=2491696. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2753275. Maximum sequence length: 2049, sample length: 2430 [default0]:Skipping sample id=2467000. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2741560. Maximum sequence length: 2049, sample length: 3826 [default0]:Skipping sample id=2738364. Maximum sequence length: 2049, sample length: 2649 [default0]:Skipping sample id=2740071. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2744304. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2743120. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2724329. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2711236. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2726261. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2754572. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2493054. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2744982. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2731601. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2734486. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2720908. Maximum sequence length: 2049, sample length: 3306 [default0]:Skipping sample id=2728315. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2748304. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2470585. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2741117. Maximum sequence length: 2049, sample length: 4330 [default0]:Skipping sample id=2739348. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2747229. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2757117. Maximum sequence length: 2049, sample length: 3106 [default0]:Skipping sample id=2745435. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2751831. Maximum sequence length: 2049, sample length: 4392 [default0]:Skipping sample id=2730733. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2721483. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2757025. Maximum sequence length: 2049, sample length: 4126 [default0]:Skipping sample id=2751121. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2727419. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2466469. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2729167. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2738912. Maximum sequence length: 2049, sample length: 4771 [default0]:Skipping sample id=2742628. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2732752. Maximum sequence length: 2049, sample length: 3588 [default0]:Skipping sample id=2732333. Maximum sequence length: 2049, sample length: 3899 [default0]:Skipping sample id=2725450. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2722316. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2720471. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2723217. Maximum sequence length: 2049, sample length: 3752 [default0]:Skipping sample id=2727328. Maximum sequence length: 2049, sample length: 5310 [default0]:Skipping sample id=2739863. Maximum sequence length: 2049, sample length: 2621 [default0]:Skipping sample id=2493430. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2715429. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2719358. Maximum sequence length: 2049, sample length: 4013 [default0]:Skipping sample id=2736111. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2742202. Maximum sequence length: 2049, sample length: 3094 [default0]:Skipping sample id=2745648. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2747152. Maximum sequence length: 2049, sample length: 5004 [default0]:Skipping sample id=2732759. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2724875. Maximum sequence length: 2049, sample length: 6621 [default0]:Skipping sample id=2717115. Maximum sequence length: 2049, sample length: 5181 [default0]:Skipping sample id=2728184. Maximum sequence length: 2049, sample length: 4897 [default0]:Skipping sample id=2741028. Maximum sequence length: 2049, sample length: 3409 [default0]:Skipping sample id=2745346. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2732230. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2755989. Maximum sequence length: 2049, sample length: 3709 [default0]:Skipping sample id=2714414. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2725275. Maximum sequence length: 2049, sample length: 2792 [default0]:Skipping sample id=2720111. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2490176. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2720899. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2715070. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2739567. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2711811. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2479596. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2751652. Maximum sequence length: 2049, sample length: 3257 [default0]:Skipping sample id=2716567. Maximum sequence length: 2049, sample length: 3382 [default0]:Skipping sample id=2752388. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2746852. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2733355. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2725422. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2745449. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2719961. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2727612. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2722572. Maximum sequence length: 2049, sample length: 2818 [default0]:Skipping sample id=2741983. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2750899. Maximum sequence length: 2049, sample length: 3087 [default0]:Skipping sample id=2729044. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2753718. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2743888. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2493776. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2732408. Maximum sequence length: 2049, sample length: 3089 [default0]:Skipping sample id=2734258. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2748668. Maximum sequence length: 2049, sample length: 3553 [default0]:Skipping sample id=2470354. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2478805. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2477409. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2724182. Maximum sequence length: 2049, sample length: 3512 [default0]:Skipping sample id=2745312. Maximum sequence length: 2049, sample length: 4593 [default0]:Skipping sample id=2751305. Maximum sequence length: 2049, sample length: 3584 [default0]:Skipping sample id=2737171. Maximum sequence length: 2049, sample length: 4500 [default0]:Skipping sample id=2491259. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2717377. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2712042. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2499340. Maximum sequence length: 2049, sample length: 3123 [default0]:Skipping sample id=2744338. Maximum sequence length: 2049, sample length: 3971 [default0]:Skipping sample id=2738803. Maximum sequence length: 2049, sample length: 2852 [default0]:Skipping sample id=2722224. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2488390. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2751784. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2716744. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2745975. Maximum sequence length: 2049, sample length: 3911 [default0]:Skipping sample id=2496404. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2736736. Maximum sequence length: 2049, sample length: 3162 [default0]:Skipping sample id=2741705. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2483119. Maximum sequence length: 2049, sample length: 3249 [default0]:Skipping sample id=2748890. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2716852. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2741715. Maximum sequence length: 2049, sample length: 3223 [default0]:Skipping sample id=2730062. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2745926. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2755261. Maximum sequence length: 2049, sample length: 3965 [default0]:Skipping sample id=2738545. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2468355. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2728902. Maximum sequence length: 2049, sample length: 5248 [default0]:Skipping sample id=2739902. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2743209. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2723851. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2719879. Maximum sequence length: 2049, sample length: 3741 [default0]:Skipping sample id=2730242. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2494081. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2718475. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2719830. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2740014. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2740419. Maximum sequence length: 2049, sample length: 3312 [default0]:Skipping sample id=2716938. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2740985. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2751891. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2721180. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2467847. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2756295. Maximum sequence length: 2049, sample length: 3523 [default0]:Skipping sample id=2714590. Maximum sequence length: 2049, sample length: 3450 [default0]:Skipping sample id=2714639. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2744320. Maximum sequence length: 2049, sample length: 2843 [default0]:Skipping sample id=2718932. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2722305. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2728576. Maximum sequence length: 2049, sample length: 6481 [default0]:Skipping sample id=2732584. Maximum sequence length: 2049, sample length: 4038 [default0]:Skipping sample id=2744610. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2747050. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2726434. Maximum sequence length: 2049, sample length: 2839 [default0]:Skipping sample id=2469480. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2711931. Maximum sequence length: 2049, sample length: 3757 [default0]:Skipping sample id=2729126. Maximum sequence length: 2049, sample length: 4117 [default0]:Skipping sample id=2718862. Maximum sequence length: 2049, sample length: 3415 [default0]:Skipping sample id=2470318. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2737556. Maximum sequence length: 2049, sample length: 4606 [default0]:Skipping sample id=2478116. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2750701. Maximum sequence length: 2049, sample length: 5064 [default0]:Skipping sample id=2732255. Maximum sequence length: 2049, sample length: 3420 [default0]:Skipping sample id=2732952. Maximum sequence length: 2049, sample length: 4538 [default0]:Skipping sample id=2750365. Maximum sequence length: 2049, sample length: 5056 [default0]:Skipping sample id=2742504. Maximum sequence length: 2049, sample length: 5080 [default0]:Skipping sample id=2749590. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2489736. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2752292. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2727900. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2725474. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2468862. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2746543. Maximum sequence length: 2049, sample length: 4097 [default0]:Skipping sample id=2719179. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2470848. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2756348. Maximum sequence length: 2049, sample length: 4008 [default0]:Skipping sample id=2735512. Maximum sequence length: 2049, sample length: 3195 [default0]:Skipping sample id=2717987. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2722415. Maximum sequence length: 2049, sample length: 3882 [default0]:Skipping sample id=2493109. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2711625. Maximum sequence length: 2049, sample length: 6060 [default0]:Skipping sample id=2733658. Maximum sequence length: 2049, sample length: 3380 [default0]:Skipping sample id=2755622. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2482938. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2741496. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2467937. Maximum sequence length: 2049, sample length: 3016 [default0]:Skipping sample id=2469994. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2742967. Maximum sequence length: 2049, sample length: 2755 [default0]:Skipping sample id=2741544. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2742719. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2742970. Maximum sequence length: 2049, sample length: 2975 [default0]:Skipping sample id=2727310. Maximum sequence length: 2049, sample length: 4089 [default0]:Skipping sample id=2719122. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2744557. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2746520. Maximum sequence length: 2049, sample length: 2974 [default0]:Skipping sample id=2487881. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2717587. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2734814. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2489744. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2753568. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2732801. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2754565. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2747449. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2710975. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2723663. Maximum sequence length: 2049, sample length: 3578 [default0]:Skipping sample id=2715331. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2488983. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2739609. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2724198. Maximum sequence length: 2049, sample length: 2590 [default0]:Skipping sample id=2731936. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2714523. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2466346. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2722991. Maximum sequence length: 2049, sample length: 3192 [default0]:Skipping sample id=2743017. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2731419. Maximum sequence length: 2049, sample length: 4124 [default0]:Skipping sample id=2478086. Maximum sequence length: 2049, sample length: 3890 [default0]:Skipping sample id=2493604. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2735842. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2719753. Maximum sequence length: 2049, sample length: 4324 [default0]:Skipping sample id=2716767. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2479724. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2721206. Maximum sequence length: 2049, sample length: 2748 [default0]:Skipping sample id=2741057. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2743780. Maximum sequence length: 2049, sample length: 4909 [default0]:Skipping sample id=2734669. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2744966. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2732939. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2746617. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2749436. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2724175. Maximum sequence length: 2049, sample length: 4006 [default0]:Skipping sample id=2743177. Maximum sequence length: 2049, sample length: 2471 [default0]:Skipping sample id=2716213. Maximum sequence length: 2049, sample length: 4069 [default0]:Skipping sample id=2737995. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2712736. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2755213. Maximum sequence length: 2049, sample length: 5864 [default0]:Skipping sample id=2729379. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2738635. Maximum sequence length: 2049, sample length: 2962 [default0]:Skipping sample id=2750513. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2716422. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2482522. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2751082. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2733657. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2749800. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2466667. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2711835. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2745474. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2724059. Maximum sequence length: 2049, sample length: 3413 [default0]:Skipping sample id=2494578. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2736384. Maximum sequence length: 2049, sample length: 3088 [default0]:Skipping sample id=2735158. Maximum sequence length: 2049, sample length: 3845 [default0]:Skipping sample id=2744878. Maximum sequence length: 2049, sample length: 4467 [default0]:Skipping sample id=2747452. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2742286. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2481865. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2750441. Maximum sequence length: 2049, sample length: 4600 [default0]:Skipping sample id=2713733. Maximum sequence length: 2049, sample length: 3415 [default0]:Skipping sample id=2466246. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2740287. Maximum sequence length: 2049, sample length: 3626 [default0]:Skipping sample id=2469180. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2733307. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2483216. Maximum sequence length: 2049, sample length: 2900 [default0]:Skipping sample id=2718199. Maximum sequence length: 2049, sample length: 3364 [default0]:Skipping sample id=2735290. Maximum sequence length: 2049, sample length: 3418 [default0]:Skipping sample id=2718330. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2741047. Maximum sequence length: 2049, sample length: 3577 [default0]:Skipping sample id=2717182. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2733963. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2731302. Maximum sequence length: 2049, sample length: 4063 [default0]:Skipping sample id=2718966. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2724807. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2486139. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2748490. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2468159. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2484466. Maximum sequence length: 2049, sample length: 2741 [default0]:Skipping sample id=2749013. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2717449. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2744688. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2734279. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2732108. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2715473. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2737152. Maximum sequence length: 2049, sample length: 4596 [default0]:Skipping sample id=2740762. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2732519. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2749894. Maximum sequence length: 2049, sample length: 3356 [default0]:Skipping sample id=2711219. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2730781. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2737270. Maximum sequence length: 2049, sample length: 5841 [default0]:Skipping sample id=2716831. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2735957. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2721008. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2748061. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2722753. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2720494. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2714005. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2714031. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2743327. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2752032. Maximum sequence length: 2049, sample length: 2656 [default0]:Skipping sample id=2714229. Maximum sequence length: 2049, sample length: 4013 [default0]:Skipping sample id=2714919. Maximum sequence length: 2049, sample length: 3988 [default0]:Skipping sample id=2486616. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2711379. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2718953. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2711752. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2733069. Maximum sequence length: 2049, sample length: 2486 [default0]:Skipping sample id=2720997. Maximum sequence length: 2049, sample length: 3098 [default0]:Skipping sample id=2740737. Maximum sequence length: 2049, sample length: 6141 [default0]:Skipping sample id=2719496. Maximum sequence length: 2049, sample length: 2714 [default0]:Skipping sample id=2725008. Maximum sequence length: 2049, sample length: 3748 [default0]:Skipping sample id=2750176. Maximum sequence length: 2049, sample length: 3041 [default0]:Skipping sample id=2486656. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2720189. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2739509. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2740715. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2478725. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2729088. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2712766. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2753947. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2742298. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2735866. Maximum sequence length: 2049, sample length: 4522 [default0]:Skipping sample id=2743145. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2751818. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2467608. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2477483. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2736959. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2487664. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2737083. Maximum sequence length: 2049, sample length: 3348 [default0]:Skipping sample id=2733065. Maximum sequence length: 2049, sample length: 4754 [default0]:Skipping sample id=2712222. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2720472. Maximum sequence length: 2049, sample length: 3355 [default0]:Skipping sample id=2737417. Maximum sequence length: 2049, sample length: 3746 [default0]:Skipping sample id=2746247. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2718783. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2747133. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2739115. Maximum sequence length: 2049, sample length: 6054 [default0]:Skipping sample id=2726786. Maximum sequence length: 2049, sample length: 3276 [default0]:Skipping sample id=2731066. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2482584. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2723433. Maximum sequence length: 2049, sample length: 5682 [default0]:Skipping sample id=2724720. Maximum sequence length: 2049, sample length: 5172 [default0]:Skipping sample id=2720874. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2717009. Maximum sequence length: 2049, sample length: 3698 [default0]:Skipping sample id=2477778. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2720878. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2752026. Maximum sequence length: 2049, sample length: 3186 [default0]:Skipping sample id=2732792. Maximum sequence length: 2049, sample length: 2951 [default0]:Skipping sample id=2480333. Maximum sequence length: 2049, sample length: 3318 [default0]:Skipping sample id=2747763. Maximum sequence length: 2049, sample length: 4938 [default0]:Skipping sample id=2479952. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2734381. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2728568. Maximum sequence length: 2049, sample length: 4406 [default0]:Skipping sample id=2735216. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2713341. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2468625. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2742281. Maximum sequence length: 2049, sample length: 4596 [default0]:Skipping sample id=2731155. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2723276. Maximum sequence length: 2049, sample length: 2932 [default0]:Skipping sample id=2739430. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2749263. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2745466. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2715702. Maximum sequence length: 2049, sample length: 3559 [default0]:Skipping sample id=2755846. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2745310. Maximum sequence length: 2049, sample length: 3307 [default0]:Skipping sample id=2498103. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2486126. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2478657. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2720737. Maximum sequence length: 2049, sample length: 3560 [default0]:Skipping sample id=2751798. Maximum sequence length: 2049, sample length: 3110 [default0]:Skipping sample id=2716602. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2484071. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2720373. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2729508. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2734617. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2730135. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2732914. Maximum sequence length: 2049, sample length: 3253 [default0]:Skipping sample id=2723559. Maximum sequence length: 2049, sample length: 2941 [default0]:Skipping sample id=2727988. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2732177. Maximum sequence length: 2049, sample length: 3268 [default0]:Skipping sample id=2715691. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2744321. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2729866. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2467330. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2736459. Maximum sequence length: 2049, sample length: 6514 [default0]:Skipping sample id=2732667. Maximum sequence length: 2049, sample length: 5060 [default0]:Skipping sample id=2742137. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2726618. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2735431. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2734449. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2748409. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2741402. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2727474. Maximum sequence length: 2049, sample length: 4322 [default0]:Skipping sample id=2482959. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2467567. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2743686. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2731730. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2722851. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2470459. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2729883. Maximum sequence length: 2049, sample length: 3368 [default0]:Skipping sample id=2722118. Maximum sequence length: 2049, sample length: 5142 [default0]:Skipping sample id=2748293. Maximum sequence length: 2049, sample length: 2513 [default0]:Skipping sample id=2722522. Maximum sequence length: 2049, sample length: 3956 [default0]:Skipping sample id=2747618. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2738403. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2715644. Maximum sequence length: 2049, sample length: 3315 [default0]:Skipping sample id=2717349. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2751990. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2494173. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2743141. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2715145. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2724379. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2715442. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2754472. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2733991. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2711473. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2751654. Maximum sequence length: 2049, sample length: 3797 [default0]:Skipping sample id=2723466. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2726844. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2743593. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2712895. Maximum sequence length: 2049, sample length: 3060 [default0]:Skipping sample id=2716244. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2733831. Maximum sequence length: 2049, sample length: 3154 [default0]:Skipping sample id=2724440. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2744691. Maximum sequence length: 2049, sample length: 3892 [default0]:Skipping sample id=2481561. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2731068. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2715263. Maximum sequence length: 2049, sample length: 3639 [default0]:Skipping sample id=2477892. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2494911. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2732343. Maximum sequence length: 2049, sample length: 3475 [default0]:Skipping sample id=2488154. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2720072. Maximum sequence length: 2049, sample length: 3508 [default0]:Skipping sample id=2734606. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2714231. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2483496. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2746642. Maximum sequence length: 2049, sample length: 2974 [default0]:Skipping sample id=2737669. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2744306. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2715866. Maximum sequence length: 2049, sample length: 3212 [default0]:Skipping sample id=2719101. Maximum sequence length: 2049, sample length: 3782 [default0]:Skipping sample id=2712795. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2712990. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2481777. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2728215. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2731255. Maximum sequence length: 2049, sample length: 3088 [default0]:Skipping sample id=2740485. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2720990. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2738604. Maximum sequence length: 2049, sample length: 3595 [default0]:Skipping sample id=2715836. Maximum sequence length: 2049, sample length: 3656 [default0]:Skipping sample id=2727411. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2745267. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2715793. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2733716. Maximum sequence length: 2049, sample length: 3055 [default0]:Skipping sample id=2736840. Maximum sequence length: 2049, sample length: 4345 [default0]:Skipping sample id=2722219. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2738853. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2723280. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2731285. Maximum sequence length: 2049, sample length: 4529 [default0]:Skipping sample id=2754596. Maximum sequence length: 2049, sample length: 3977 [default0]:Skipping sample id=2724500. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2729459. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2726506. Maximum sequence length: 2049, sample length: 2500 [default0]:Skipping sample id=2749017. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2717423. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2496149. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2725939. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2491183. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2729660. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2712248. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2748636. Maximum sequence length: 2049, sample length: 3048 [default0]:Skipping sample id=2732356. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2477012. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2726965. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2465984. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2466575. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2743917. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2745765. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2466054. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2467156. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2735348. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2739886. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2734420. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2756077. Maximum sequence length: 2049, sample length: 4373 [default0]:Skipping sample id=2726342. Maximum sequence length: 2049, sample length: 2956 [default0]:Skipping sample id=2738047. Maximum sequence length: 2049, sample length: 2864 [default0]:Skipping sample id=2756027. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2746341. Maximum sequence length: 2049, sample length: 2960 [default0]:Skipping sample id=2477618. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2729568. Maximum sequence length: 2049, sample length: 5964 [default0]:Skipping sample id=2498568. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2482077. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2711792. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2749763. Maximum sequence length: 2049, sample length: 5111 [default0]:Skipping sample id=2723042. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2740733. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2716283. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2749058. Maximum sequence length: 2049, sample length: 6975 [default0]:Skipping sample id=2486316. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2732891. Maximum sequence length: 2049, sample length: 3986 [default0]:Skipping sample id=2730565. Maximum sequence length: 2049, sample length: 5980 [default0]:Skipping sample id=2725738. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2493491. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2495588. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2717896. Maximum sequence length: 2049, sample length: 4158 [default0]:Skipping sample id=2495358. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2752277. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2729898. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2720524. Maximum sequence length: 2049, sample length: 5542 [default0]:Skipping sample id=2737437. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2735608. Maximum sequence length: 2049, sample length: 6255 [default0]:Skipping sample id=2749667. Maximum sequence length: 2049, sample length: 5617 [default0]:Skipping sample id=2718712. Maximum sequence length: 2049, sample length: 3110 [default0]:Skipping sample id=2719011. Maximum sequence length: 2049, sample length: 4670 [default0]:Skipping sample id=2489496. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2493941. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2747518. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2751167. Maximum sequence length: 2049, sample length: 3058 [default0]:Skipping sample id=2715258. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2744374. Maximum sequence length: 2049, sample length: 4547 [default0]:Skipping sample id=2742857. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2725317. Maximum sequence length: 2049, sample length: 5094 [default0]:Skipping sample id=2734365. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2715601. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2728008. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2730981. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2746310. Maximum sequence length: 2049, sample length: 2687 [default0]:Skipping sample id=2714089. Maximum sequence length: 2049, sample length: 5817 [default0]:Skipping sample id=2495725. Maximum sequence length: 2049, sample length: 3304 [default0]:Skipping sample id=2713571. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2715377. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2743524. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2718024. Maximum sequence length: 2049, sample length: 3060 [default0]:Skipping sample id=2735300. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2485372. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2718873. Maximum sequence length: 2049, sample length: 3325 [default0]:Skipping sample id=2726539. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2730475. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2735387. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2748139. Maximum sequence length: 2049, sample length: 3506 [default0]:Skipping sample id=2719655. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2723069. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2724093. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2747993. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2746496. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2714219. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2721075. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2753468. Maximum sequence length: 2049, sample length: 4326 [default0]:Skipping sample id=2715508. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2493072. Maximum sequence length: 2049, sample length: 2593 [default0]:Skipping sample id=2751923. Maximum sequence length: 2049, sample length: 4239 [default0]:Skipping sample id=2713749. Maximum sequence length: 2049, sample length: 3591 [default0]:Skipping sample id=2466092. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2735147. Maximum sequence length: 2049, sample length: 4074 [default0]:Skipping sample id=2737742. Maximum sequence length: 2049, sample length: 3931 [default0]:Skipping sample id=2724396. Maximum sequence length: 2049, sample length: 2683 [default0]:Skipping sample id=2746943. Maximum sequence length: 2049, sample length: 2545 [default0]:Skipping sample id=2482787. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2729839. Maximum sequence length: 2049, sample length: 6628 [default0]:Skipping sample id=2745339. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2734697. Maximum sequence length: 2049, sample length: 5964 [default0]:Skipping sample id=2482371. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2753948. Maximum sequence length: 2049, sample length: 3089 [default0]:Skipping sample id=2753229. Maximum sequence length: 2049, sample length: 3932 [default0]:Skipping sample id=2483673. Maximum sequence length: 2049, sample length: 2569 [default0]:Skipping sample id=2722136. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2720231. Maximum sequence length: 2049, sample length: 4304 [default0]:Skipping sample id=2713194. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2755412. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2746839. Maximum sequence length: 2049, sample length: 5866 [default0]:Skipping sample id=2711644. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2728001. Maximum sequence length: 2049, sample length: 2998 [default0]:Skipping sample id=2713594. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2734394. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2750388. Maximum sequence length: 2049, sample length: 4297 [default0]:Skipping sample id=2741215. Maximum sequence length: 2049, sample length: 3807 [default0]:Skipping sample id=2488495. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2747654. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2732515. Maximum sequence length: 2049, sample length: 3560 [default0]:Skipping sample id=2727079. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2482887. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2716977. Maximum sequence length: 2049, sample length: 4381 [default0]:Skipping sample id=2753490. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2722009. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2752341. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2726986. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2742272. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2751123. Maximum sequence length: 2049, sample length: 5868 [default0]:Skipping sample id=2722121. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2732942. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2481100. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2717781. Maximum sequence length: 2049, sample length: 4802 [default0]:Skipping sample id=2728862. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2477156. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2749803. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2745952. Maximum sequence length: 2049, sample length: 5751 [default0]:Skipping sample id=2746811. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2755099. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2718801. Maximum sequence length: 2049, sample length: 2545 [default0]:Skipping sample id=2731843. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2749670. Maximum sequence length: 2049, sample length: 2695 [default0]:Skipping sample id=2756392. Maximum sequence length: 2049, sample length: 2589 [default0]:Skipping sample id=2755507. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2740130. Maximum sequence length: 2049, sample length: 4289 [default0]:Skipping sample id=2721973. Maximum sequence length: 2049, sample length: 3250 [default0]:Skipping sample id=2721109. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2721453. Maximum sequence length: 2049, sample length: 4645 [default0]:Skipping sample id=2477942. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2727274. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2736300. Maximum sequence length: 2049, sample length: 4486 [default0]:Skipping sample id=2712012. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2711096. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2467293. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2733635. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2756023. Maximum sequence length: 2049, sample length: 3082 [default0]:Skipping sample id=2742431. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2491873. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2748353. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2499226. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2726230. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2479495. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2489224. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2734100. Maximum sequence length: 2049, sample length: 3436 [default0]:Skipping sample id=2733742. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2712089. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2714960. Maximum sequence length: 2049, sample length: 3634 [default0]:Skipping sample id=2711887. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2736797. Maximum sequence length: 2049, sample length: 3099 [default0]:Skipping sample id=2739319. Maximum sequence length: 2049, sample length: 6630 [default0]:Skipping sample id=2723252. Maximum sequence length: 2049, sample length: 4726 [default0]:Skipping sample id=2711040. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2717835. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2721241. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2739594. Maximum sequence length: 2049, sample length: 5811 [default0]:Skipping sample id=2725502. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2497905. Maximum sequence length: 2049, sample length: 3114 [default0]:Skipping sample id=2749575. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2742083. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2715008. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2730479. Maximum sequence length: 2049, sample length: 6255 [default0]:Skipping sample id=2720026. Maximum sequence length: 2049, sample length: 4155 [default0]:Skipping sample id=2722825. Maximum sequence length: 2049, sample length: 3476 [default0]:Skipping sample id=2737820. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2730775. Maximum sequence length: 2049, sample length: 6421 [default0]:Skipping sample id=2483470. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2735893. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2719503. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2737827. Maximum sequence length: 2049, sample length: 3903 [default0]:Skipping sample id=2715890. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2725687. Maximum sequence length: 2049, sample length: 2456 [default0]:Skipping sample id=2737525. Maximum sequence length: 2049, sample length: 3505 [default0]:Skipping sample id=2482706. Maximum sequence length: 2049, sample length: 3595 [default0]:Skipping sample id=2740976. Maximum sequence length: 2049, sample length: 3481 [default0]:Skipping sample id=2726435. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2738450. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2734196. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2492542. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2726507. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2755930. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2716028. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2739145. Maximum sequence length: 2049, sample length: 4124 [default0]:Skipping sample id=2746490. Maximum sequence length: 2049, sample length: 3267 [default0]:Skipping sample id=2731798. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2747226. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2738731. Maximum sequence length: 2049, sample length: 3531 [default0]:Skipping sample id=2750428. Maximum sequence length: 2049, sample length: 3770 [default0]:Skipping sample id=2737951. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2470622. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2747912. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2755095. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2741208. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2729648. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2747214. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2756372. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2735123. Maximum sequence length: 2049, sample length: 4975 [default0]:Skipping sample id=2742558. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2492555. Maximum sequence length: 2049, sample length: 3664 [default0]:Skipping sample id=2495976. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2727590. Maximum sequence length: 2049, sample length: 5933 [default0]:Skipping sample id=2734833. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2717455. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2726164. Maximum sequence length: 2049, sample length: 3626 [default0]:Skipping sample id=2748017. Maximum sequence length: 2049, sample length: 5323 [default0]:Skipping sample id=2482038. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2750527. Maximum sequence length: 2049, sample length: 4205 [default0]:Skipping sample id=2739203. Maximum sequence length: 2049, sample length: 2857 [default0]:Skipping sample id=2736154. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2713532. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2733434. Maximum sequence length: 2049, sample length: 4924 [default0]:Skipping sample id=2750812. Maximum sequence length: 2049, sample length: 3242 [default0]:Skipping sample id=2757092. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2467895. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2745938. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2714892. Maximum sequence length: 2049, sample length: 3589 [default0]:Skipping sample id=2751331. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2488885. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2721498. Maximum sequence length: 2049, sample length: 2972 [default0]:Skipping sample id=2737327. Maximum sequence length: 2049, sample length: 3464 [default0]:Skipping sample id=2726995. Maximum sequence length: 2049, sample length: 3449 [default0]:Skipping sample id=2715169. Maximum sequence length: 2049, sample length: 3622 [default0]:Skipping sample id=2483181. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2750995. Maximum sequence length: 2049, sample length: 6148 [default0]:Skipping sample id=2736414. Maximum sequence length: 2049, sample length: 4528 [default0]:Skipping sample id=2731815. Maximum sequence length: 2049, sample length: 2958 [default0]:Skipping sample id=2742063. Maximum sequence length: 2049, sample length: 2714 [default0]:Skipping sample id=2493137. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2754910. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2738300. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2735446. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2717070. Maximum sequence length: 2049, sample length: 5320 [default0]:Skipping sample id=2753416. Maximum sequence length: 2049, sample length: 3639 [default0]:Skipping sample id=2745923. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2727675. Maximum sequence length: 2049, sample length: 3024 [default0]:Skipping sample id=2745650. Maximum sequence length: 2049, sample length: 4243 [default0]:Skipping sample id=2733857. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2486205. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2754776. Maximum sequence length: 2049, sample length: 3154 [default0]:Skipping sample id=2469262. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2717852. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2738378. Maximum sequence length: 2049, sample length: 3041 [default0]:Skipping sample id=2736370. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2713991. Maximum sequence length: 2049, sample length: 3432 [default0]:Skipping sample id=2728306. Maximum sequence length: 2049, sample length: 3820 [default0]:Skipping sample id=2718012. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2739623. Maximum sequence length: 2049, sample length: 5143 [default0]:Skipping sample id=2712959. Maximum sequence length: 2049, sample length: 3745 [default0]:Skipping sample id=2722639. Maximum sequence length: 2049, sample length: 5339 [default0]:Skipping sample id=2477157. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2718050. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2747158. Maximum sequence length: 2049, sample length: 4868 [default0]:Skipping sample id=2757035. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2489876. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2715749. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2485089. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2720798. Maximum sequence length: 2049, sample length: 3269 [default0]:Skipping sample id=2755308. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2733129. Maximum sequence length: 2049, sample length: 4553 [default0]:Skipping sample id=2757112. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2724638. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2715383. Maximum sequence length: 2049, sample length: 3595 [default0]:Skipping sample id=2727945. Maximum sequence length: 2049, sample length: 4324 [default0]:Skipping sample id=2739656. Maximum sequence length: 2049, sample length: 4248 [default0]:Skipping sample id=2714241. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2722996. Maximum sequence length: 2049, sample length: 3080 [default0]:Skipping sample id=2741748. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2746882. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2733381. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2729233. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2747136. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2715886. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2725427. Maximum sequence length: 2049, sample length: 3629 [default0]:Skipping sample id=2751480. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2740012. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2738400. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2740217. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2749430. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2479048. Maximum sequence length: 2049, sample length: 2728 [default0]:Skipping sample id=2483085. Maximum sequence length: 2049, sample length: 3179 [default0]:Skipping sample id=2714259. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2754961. Maximum sequence length: 2049, sample length: 3000 [default0]:Skipping sample id=2724385. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2722938. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2491688. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2737454. Maximum sequence length: 2049, sample length: 3413 [default0]:Skipping sample id=2483858. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2714415. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2735114. Maximum sequence length: 2049, sample length: 2857 [default0]:Skipping sample id=2751242. Maximum sequence length: 2049, sample length: 2968 [default0]:Skipping sample id=2746673. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2749433. Maximum sequence length: 2049, sample length: 2837 [default0]:Skipping sample id=2717769. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2719405. Maximum sequence length: 2049, sample length: 3268 [default0]:Skipping sample id=2730560. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2745364. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2745384. Maximum sequence length: 2049, sample length: 3537 [default0]:Skipping sample id=2751667. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2711986. Maximum sequence length: 2049, sample length: 5181 [default0]:Skipping sample id=2717171. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2719568. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2728981. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2726005. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2722822. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2495085. Maximum sequence length: 2049, sample length: 2861 [default0]:Skipping sample id=2720730. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2739092. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2745917. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2714909. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2752558. Maximum sequence length: 2049, sample length: 4506 [default0]:Skipping sample id=2493878. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2724762. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2741205. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2741254. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2497886. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2740728. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2748408. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2712166. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2751308. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2726811. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2488956. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2748497. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2743243. Maximum sequence length: 2049, sample length: 4620 [default0]:Skipping sample id=2480665. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2724672. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2722647. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2738986. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2713089. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2465913. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2752592. Maximum sequence length: 2049, sample length: 4842 [default0]:Skipping sample id=2718300. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2731794. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2740519. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2744446. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2750743. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2755794. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2723100. Maximum sequence length: 2049, sample length: 4917 [default0]:Skipping sample id=2739451. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2737351. Maximum sequence length: 2049, sample length: 3159 [default0]:Skipping sample id=2480015. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2738816. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2714525. Maximum sequence length: 2049, sample length: 3882 [default0]:Skipping sample id=2755160. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2731986. Maximum sequence length: 2049, sample length: 4228 [default0]:Skipping sample id=2725606. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2467856. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2744841. Maximum sequence length: 2049, sample length: 2646 [default0]:Skipping sample id=2718266. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2725902. Maximum sequence length: 2049, sample length: 3322 [default0]:Skipping sample id=2741711. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2739471. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2467982. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2734977. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2727039. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2711708. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2741599. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2729304. Maximum sequence length: 2049, sample length: 3989 [default0]:Skipping sample id=2749825. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2489643. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2740516. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2719593. Maximum sequence length: 2049, sample length: 3353 [default0]:Skipping sample id=2746641. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2752395. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2754584. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2735266. Maximum sequence length: 2049, sample length: 3731 [default0]:Skipping sample id=2466448. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2753960. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2730985. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2711154. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2719660. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2719201. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2495816. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2497044. Maximum sequence length: 2049, sample length: 3387 [default0]:Skipping sample id=2740109. Maximum sequence length: 2049, sample length: 3175 [default0]:Skipping sample id=2730107. Maximum sequence length: 2049, sample length: 2850 [default0]:Skipping sample id=2720907. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2718342. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2755917. Maximum sequence length: 2049, sample length: 3040 [default0]:Skipping sample id=2478460. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2711504. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2745478. Maximum sequence length: 2049, sample length: 3740 [default0]:Skipping sample id=2734005. Maximum sequence length: 2049, sample length: 3185 [default0]:Skipping sample id=2716869. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2718443. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2470621. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2745248. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2750158. Maximum sequence length: 2049, sample length: 4097 [default0]:Skipping sample id=2719330. Maximum sequence length: 2049, sample length: 4703 [default0]:Skipping sample id=2723016. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2729572. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2719254. Maximum sequence length: 2049, sample length: 4600 [default0]:Skipping sample id=2743172. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2756248. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2743231. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2732611. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2730190. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2745200. Maximum sequence length: 2049, sample length: 3397 [default0]:Skipping sample id=2741970. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2746812. Maximum sequence length: 2049, sample length: 3494 [default0]:Skipping sample id=2750517. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2734172. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2740443. Maximum sequence length: 2049, sample length: 5058 [default0]:Skipping sample id=2751771. Maximum sequence length: 2049, sample length: 2981 [default0]:Skipping sample id=2743058. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2751898. Maximum sequence length: 2049, sample length: 3113 [default0]:Skipping sample id=2716971. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2725826. Maximum sequence length: 2049, sample length: 4181 [default0]:Skipping sample id=2739685. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2735326. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2730819. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2733830. Maximum sequence length: 2049, sample length: 2623 [default0]:Skipping sample id=2488630. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2712640. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2753357. Maximum sequence length: 2049, sample length: 3673 [default0]:Skipping sample id=2734574. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2739733. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2714165. Maximum sequence length: 2049, sample length: 5322 [default0]:Skipping sample id=2748909. Maximum sequence length: 2049, sample length: 4916 [default0]:Skipping sample id=2712028. Maximum sequence length: 2049, sample length: 3171 [default0]:Skipping sample id=2725036. Maximum sequence length: 2049, sample length: 3261 [default0]:Skipping sample id=2755122. Maximum sequence length: 2049, sample length: 3381 [default0]:Skipping sample id=2756682. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2723858. Maximum sequence length: 2049, sample length: 4768 [default0]:Skipping sample id=2731893. Maximum sequence length: 2049, sample length: 3481 [default0]:Skipping sample id=2740668. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2726906. Maximum sequence length: 2049, sample length: 3534 [default0]:Skipping sample id=2727311. Maximum sequence length: 2049, sample length: 3265 [default0]:Skipping sample id=2465769. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2721166. Maximum sequence length: 2049, sample length: 3302 [default0]:Skipping sample id=2715354. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2722615. Maximum sequence length: 2049, sample length: 3243 [default0]:Skipping sample id=2731809. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2736439. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2720594. Maximum sequence length: 2049, sample length: 3381 [default0]:Skipping sample id=2724944. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2722196. Maximum sequence length: 2049, sample length: 2907 [default0]:Skipping sample id=2723085. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2741282. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2719017. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2722656. Maximum sequence length: 2049, sample length: 3270 [default0]:Skipping sample id=2740309. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2724722. Maximum sequence length: 2049, sample length: 3360 [default0]:Skipping sample id=2751752. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2718150. Maximum sequence length: 2049, sample length: 6409 [default0]:Skipping sample id=2737518. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2712080. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2754897. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2469820. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2752014. Maximum sequence length: 2049, sample length: 3322 [default0]:Skipping sample id=2729925. Maximum sequence length: 2049, sample length: 4765 [default0]:Skipping sample id=2743884. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2724652. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2734521. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2728025. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2741272. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2749473. Maximum sequence length: 2049, sample length: 3740 [default0]:Skipping sample id=2731077. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2736411. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2747976. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2493242. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2748814. Maximum sequence length: 2049, sample length: 5624 [default0]:Skipping sample id=2756028. Maximum sequence length: 2049, sample length: 3085 [default0]:Skipping sample id=2730376. Maximum sequence length: 2049, sample length: 3737 [default0]:Skipping sample id=2754622. Maximum sequence length: 2049, sample length: 3761 [default0]:Skipping sample id=2720066. Maximum sequence length: 2049, sample length: 3724 [default0]:Skipping sample id=2718253. Maximum sequence length: 2049, sample length: 4371 [default0]:Skipping sample id=2754551. Maximum sequence length: 2049, sample length: 3929 [default0]:Skipping sample id=2739299. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2756566. Maximum sequence length: 2049, sample length: 3770 [default0]:Skipping sample id=2717838. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2719413. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2482474. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2716093. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2494659. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2748825. Maximum sequence length: 2049, sample length: 4035 [default0]:Skipping sample id=2722962. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2727461. Maximum sequence length: 2049, sample length: 3740 [default0]:Skipping sample id=2717127. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2729987. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2755558. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2736594. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2721385. Maximum sequence length: 2049, sample length: 3358 [default0]:Skipping sample id=2738118. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2751046. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2732403. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2717504. Maximum sequence length: 2049, sample length: 3996 [default0]:Skipping sample id=2718724. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2718479. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2744752. Maximum sequence length: 2049, sample length: 5145 [default0]:Skipping sample id=2718699. Maximum sequence length: 2049, sample length: 3978 [default0]:Skipping sample id=2716907. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2726646. Maximum sequence length: 2049, sample length: 4401 [default0]:Skipping sample id=2749467. Maximum sequence length: 2049, sample length: 3978 [default0]:Skipping sample id=2721372. Maximum sequence length: 2049, sample length: 3462 [default0]:Skipping sample id=2722924. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2747730. Maximum sequence length: 2049, sample length: 3401 [default0]:Skipping sample id=2734397. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2712978. Maximum sequence length: 2049, sample length: 3915 [default0]:Skipping sample id=2731366. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2726124. Maximum sequence length: 2049, sample length: 3219 [default0]:Skipping sample id=2727192. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2747360. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2722455. Maximum sequence length: 2049, sample length: 4162 [default0]:Skipping sample id=2484802. Maximum sequence length: 2049, sample length: 3017 [default0]:Skipping sample id=2740543. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2496741. Maximum sequence length: 2049, sample length: 3530 [default0]:Skipping sample id=2753363. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2494139. Maximum sequence length: 2049, sample length: 3073 [default0]:Skipping sample id=2751343. Maximum sequence length: 2049, sample length: 3685 [default0]:Skipping sample id=2725953. Maximum sequence length: 2049, sample length: 3967 [default0]:Skipping sample id=2712360. Maximum sequence length: 2049, sample length: 4977 [default0]:Skipping sample id=2721866. Maximum sequence length: 2049, sample length: 3466 [default0]:Skipping sample id=2748697. Maximum sequence length: 2049, sample length: 4283 [default0]:Skipping sample id=2735785. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2724734. Maximum sequence length: 2049, sample length: 3863 [default0]:Skipping sample id=2741473. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2730413. Maximum sequence length: 2049, sample length: 3578 [default0]:Skipping sample id=2715669. Maximum sequence length: 2049, sample length: 3269 [default0]:Skipping sample id=2712607. Maximum sequence length: 2049, sample length: 2706 [default0]:Skipping sample id=2715830. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2739283. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2714934. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2714192. Maximum sequence length: 2049, sample length: 2562 [default0]:Skipping sample id=2732624. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2721221. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2756277. Maximum sequence length: 2049, sample length: 3618 [default0]:Skipping sample id=2719763. Maximum sequence length: 2049, sample length: 3101 [default0]:Skipping sample id=2728485. Maximum sequence length: 2049, sample length: 3044 [default0]:Skipping sample id=2727021. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2747367. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2755284. Maximum sequence length: 2049, sample length: 4430 [default0]:Skipping sample id=2724136. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2755774. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2732992. Maximum sequence length: 2049, sample length: 5076 [default0]:Skipping sample id=2755712. Maximum sequence length: 2049, sample length: 3806 [default0]:Skipping sample id=2725892. Maximum sequence length: 2049, sample length: 4240 [default0]:Skipping sample id=2728545. Maximum sequence length: 2049, sample length: 2975 [default0]:Skipping sample id=2748476. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2716727. Maximum sequence length: 2049, sample length: 3479 [default0]:Skipping sample id=2488581. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2715764. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2744835. Maximum sequence length: 2049, sample length: 3776 [default0]:Skipping sample id=2746453. Maximum sequence length: 2049, sample length: 2981 [default0]:Skipping sample id=2731440. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2733867. Maximum sequence length: 2049, sample length: 3695 [default0]:Skipping sample id=2732232. Maximum sequence length: 2049, sample length: 2232 [default0]:Skipping sample id=2753607. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2751754. Maximum sequence length: 2049, sample length: 2120 [default0]:Skipping sample id=2469088. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2732173. Maximum sequence length: 2049, sample length: 2876 [default0]:Skipping sample id=2748011. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2714961. Maximum sequence length: 2049, sample length: 4832 [default0]:Skipping sample id=2489756. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2735042. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2740689. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2719004. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2480714. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2477756. Maximum sequence length: 2049, sample length: 3159 [default0]:Skipping sample id=2722137. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2487866. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2479972. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2720221. Maximum sequence length: 2049, sample length: 6265 [default0]:Skipping sample id=2479939. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2736639. Maximum sequence length: 2049, sample length: 3180 [default0]:Skipping sample id=2496768. Maximum sequence length: 2049, sample length: 2761 [default0]:Skipping sample id=2749331. Maximum sequence length: 2049, sample length: 3457 [default0]:Skipping sample id=2727185. Maximum sequence length: 2049, sample length: 2826 [default0]:Skipping sample id=2493470. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2744496. Maximum sequence length: 2049, sample length: 4440 [default0]:Skipping sample id=2756459. Maximum sequence length: 2049, sample length: 4242 [default0]:Skipping sample id=2715192. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2755450. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2720708. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2479062. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2733078. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2715695. Maximum sequence length: 2049, sample length: 3960 [default0]:Skipping sample id=2733416. Maximum sequence length: 2049, sample length: 2456 [default0]:Skipping sample id=2721577. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2742682. Maximum sequence length: 2049, sample length: 4368 [default0]:Skipping sample id=2729301. Maximum sequence length: 2049, sample length: 2568 [default0]:Skipping sample id=2719943. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2724631. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2735041. Maximum sequence length: 2049, sample length: 3872 [default0]:Skipping sample id=2747742. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2754292. Maximum sequence length: 2049, sample length: 3364 [default0]:Skipping sample id=2480106. Maximum sequence length: 2049, sample length: 3236 [default0]:Skipping sample id=2467958. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2729242. Maximum sequence length: 2049, sample length: 2952 [default0]:Skipping sample id=2756073. Maximum sequence length: 2049, sample length: 5015 [default0]:Skipping sample id=2733671. Maximum sequence length: 2049, sample length: 7200 [default0]:Skipping sample id=2717924. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2727103. Maximum sequence length: 2049, sample length: 2232 [default0]:Skipping sample id=2729300. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2744247. Maximum sequence length: 2049, sample length: 3265 [default0]:Skipping sample id=2723927. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2717216. Maximum sequence length: 2049, sample length: 4332 [default0]:Skipping sample id=2726235. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2735057. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2723946. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2738335. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2717047. Maximum sequence length: 2049, sample length: 3881 [default0]:Skipping sample id=2721634. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2748260. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2745645. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2717767. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2731334. Maximum sequence length: 2049, sample length: 4303 [default0]:Skipping sample id=2731972. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2715977. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2713068. Maximum sequence length: 2049, sample length: 3990 [default0]:Skipping sample id=2712169. Maximum sequence length: 2049, sample length: 3468 [default0]:Skipping sample id=2752359. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2733360. Maximum sequence length: 2049, sample length: 2998 [default0]:Skipping sample id=2727765. Maximum sequence length: 2049, sample length: 4475 [default0]:Skipping sample id=2756639. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2731544. Maximum sequence length: 2049, sample length: 3013 [default0]:Skipping sample id=2482070. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2747033. Maximum sequence length: 2049, sample length: 2862 [default0]:Skipping sample id=2754287. Maximum sequence length: 2049, sample length: 3066 [default0]:Skipping sample id=2487611. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2726042. Maximum sequence length: 2049, sample length: 4503 [default0]:Skipping sample id=2742216. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2715409. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2751991. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2742497. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2715589. Maximum sequence length: 2049, sample length: 4736 [default0]:Skipping sample id=2711911. Maximum sequence length: 2049, sample length: 4714 [default0]:Skipping sample id=2730922. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2738861. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2492429. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2711365. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2492357. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2750240. Maximum sequence length: 2049, sample length: 5413 [default0]:Skipping sample id=2468310. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2742788. Maximum sequence length: 2049, sample length: 3328 [default0]:Skipping sample id=2737169. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2746767. Maximum sequence length: 2049, sample length: 4121 [default0]:Skipping sample id=2714937. Maximum sequence length: 2049, sample length: 3724 [default0]:Skipping sample id=2750329. Maximum sequence length: 2049, sample length: 3404 [default0]:Skipping sample id=2752771. Maximum sequence length: 2049, sample length: 3344 [default0]:Skipping sample id=2482068. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2751839. Maximum sequence length: 2049, sample length: 4321 [default0]:Skipping sample id=2714898. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2724836. Maximum sequence length: 2049, sample length: 3785 [default0]:Skipping sample id=2738789. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2477386. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2749921. Maximum sequence length: 2049, sample length: 3070 [default0]:Skipping sample id=2747678. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2468797. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2736609. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2726632. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2717693. Maximum sequence length: 2049, sample length: 3174 [default0]:Skipping sample id=2744563. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2730648. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2744047. Maximum sequence length: 2049, sample length: 3905 [default0]:Skipping sample id=2727211. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2731362. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2490968. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2720402. Maximum sequence length: 2049, sample length: 3305 [default0]:Skipping sample id=2742015. Maximum sequence length: 2049, sample length: 3062 [default0]:Skipping sample id=2746351. Maximum sequence length: 2049, sample length: 2456 [default0]:Skipping sample id=2733064. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2744687. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2489855. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2493987. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2741424. Maximum sequence length: 2049, sample length: 3109 [default0]:Skipping sample id=2738523. Maximum sequence length: 2049, sample length: 3023 [default0]:Skipping sample id=2466012. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2730244. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2493081. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2753495. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2713868. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2754491. Maximum sequence length: 2049, sample length: 2741 [default0]:Skipping sample id=2748935. Maximum sequence length: 2049, sample length: 3438 [default0]:Skipping sample id=2724556. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2747189. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2742633. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2728776. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2728842. Maximum sequence length: 2049, sample length: 5707 [default0]:Skipping sample id=2755903. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2736503. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2735762. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2495124. Maximum sequence length: 2049, sample length: 3349 [default0]:Skipping sample id=2750713. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2755668. Maximum sequence length: 2049, sample length: 3679 [default0]:Skipping sample id=2712306. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2727460. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2498775. Maximum sequence length: 2049, sample length: 2810 [default0]:Skipping sample id=2494751. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2733828. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2747187. Maximum sequence length: 2049, sample length: 4917 [default0]:Skipping sample id=2726250. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2723146. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2755328. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2727370. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2496923. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2722563. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2739310. Maximum sequence length: 2049, sample length: 3545 [default0]:Skipping sample id=2738991. Maximum sequence length: 2049, sample length: 4219 [default0]:Skipping sample id=2734710. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2723604. Maximum sequence length: 2049, sample length: 4369 [default0]:Skipping sample id=2717014. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2491719. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2748472. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2719038. Maximum sequence length: 2049, sample length: 3612 [default0]:Skipping sample id=2721517. Maximum sequence length: 2049, sample length: 4418 [default0]:Skipping sample id=2733316. Maximum sequence length: 2049, sample length: 3177 [default0]:Skipping sample id=2732830. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2719347. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2737081. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2719217. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2728538. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2726606. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2490354. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2732736. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2486464. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2728919. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2721892. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2467250. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2746383. Maximum sequence length: 2049, sample length: 3621 [default0]:Skipping sample id=2748500. Maximum sequence length: 2049, sample length: 3162 [default0]:Skipping sample id=2743991. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2722248. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2722150. Maximum sequence length: 2049, sample length: 3106 [default0]:Skipping sample id=2725568. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2732042. Maximum sequence length: 2049, sample length: 4673 [default0]:Skipping sample id=2749991. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2736704. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2733674. Maximum sequence length: 2049, sample length: 2975 [default0]:Skipping sample id=2722879. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2717772. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2724171. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2749392. Maximum sequence length: 2049, sample length: 3071 [default0]:Skipping sample id=2752791. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2492434. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2756966. Maximum sequence length: 2049, sample length: 5239 [default0]:Skipping sample id=2737433. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2498146. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2721750. Maximum sequence length: 2049, sample length: 4053 [default0]:Skipping sample id=2745260. Maximum sequence length: 2049, sample length: 2598 [default0]:Skipping sample id=2497441. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2750166. Maximum sequence length: 2049, sample length: 6563 [default0]:Skipping sample id=2747256. Maximum sequence length: 2049, sample length: 3800 [default0]:Skipping sample id=2495672. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2750466. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2717795. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2736975. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2753316. Maximum sequence length: 2049, sample length: 2640 [default0]:Skipping sample id=2716470. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2716904. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2727227. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2739491. Maximum sequence length: 2049, sample length: 2937 [default0]:Skipping sample id=2478722. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2716843. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2752373. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2737138. Maximum sequence length: 2049, sample length: 4125 [default0]:Skipping sample id=2740290. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2750929. Maximum sequence length: 2049, sample length: 3438 [default0]:Skipping sample id=2466717. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2723937. Maximum sequence length: 2049, sample length: 3153 [default0]:Skipping sample id=2731879. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2733631. Maximum sequence length: 2049, sample length: 3750 [default0]:Skipping sample id=2719410. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2726366. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2752050. Maximum sequence length: 2049, sample length: 4844 [default0]:Skipping sample id=2721007. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2719498. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2494557. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2726181. Maximum sequence length: 2049, sample length: 3086 [default0]:Skipping sample id=2743152. Maximum sequence length: 2049, sample length: 4414 [default0]:Skipping sample id=2715742. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2478267. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2725870. Maximum sequence length: 2049, sample length: 3003 [default0]:Skipping sample id=2751345. Maximum sequence length: 2049, sample length: 3428 [default0]:Skipping sample id=2723453. Maximum sequence length: 2049, sample length: 4635 [default0]:Skipping sample id=2751511. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2495133. Maximum sequence length: 2049, sample length: 2842 [default0]:Skipping sample id=2724439. Maximum sequence length: 2049, sample length: 5981 [default0]:Skipping sample id=2737434. Maximum sequence length: 2049, sample length: 3494 [default0]:Skipping sample id=2711888. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2733305. Maximum sequence length: 2049, sample length: 4075 [default0]:Skipping sample id=2719569. Maximum sequence length: 2049, sample length: 6863 [default0]:Skipping sample id=2730232. Maximum sequence length: 2049, sample length: 3812 [default0]:Skipping sample id=2719388. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2718045. Maximum sequence length: 2049, sample length: 3156 [default0]:Skipping sample id=2738354. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2746778. Maximum sequence length: 2049, sample length: 4070 [default0]:Skipping sample id=2747110. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2732040. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2751280. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2743968. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2469354. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2732871. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2712030. Maximum sequence length: 2049, sample length: 14257 [default0]:Skipping sample id=2481989. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2722176. Maximum sequence length: 2049, sample length: 3888 [default0]:Skipping sample id=2740097. Maximum sequence length: 2049, sample length: 6417 [default0]:Skipping sample id=2746669. Maximum sequence length: 2049, sample length: 3102 [default0]:Skipping sample id=2481113. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2752891. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2748571. Maximum sequence length: 2049, sample length: 3638 [default0]:Skipping sample id=2752859. Maximum sequence length: 2049, sample length: 3400 [default0]:Skipping sample id=2731415. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2752151. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2728171. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2714686. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2470740. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2756089. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2740722. Maximum sequence length: 2049, sample length: 7556 [default0]:Skipping sample id=2754534. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2715279. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2722030. Maximum sequence length: 2049, sample length: 3851 [default0]:Skipping sample id=2755151. Maximum sequence length: 2049, sample length: 8127 [default0]:Skipping sample id=2741838. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2487996. Maximum sequence length: 2049, sample length: 2783 [default0]:Skipping sample id=2494163. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2467689. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2753670. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2724365. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2714832. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2721103. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2744955. Maximum sequence length: 2049, sample length: 5859 [default0]:Skipping sample id=2712579. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2727690. Maximum sequence length: 2049, sample length: 6946 [default0]:Skipping sample id=2738220. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2713489. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2731465. Maximum sequence length: 2049, sample length: 2671 [default0]:Skipping sample id=2746717. Maximum sequence length: 2049, sample length: 6239 [default0]:Skipping sample id=2720387. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2723372. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2748963. Maximum sequence length: 2049, sample length: 3738 [default0]:Skipping sample id=2752963. Maximum sequence length: 2049, sample length: 3650 [default0]:Skipping sample id=2742931. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2479085. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2733732. Maximum sequence length: 2049, sample length: 3458 [default0]:Skipping sample id=2713151. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2753257. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2726689. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2721194. Maximum sequence length: 2049, sample length: 5988 [default0]:Skipping sample id=2756626. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2718606. Maximum sequence length: 2049, sample length: 3847 [default0]:Skipping sample id=2746020. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2718912. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2721191. Maximum sequence length: 2049, sample length: 3191 [default0]:Skipping sample id=2715539. Maximum sequence length: 2049, sample length: 3759 [default0]:Skipping sample id=2752007. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2736365. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2711260. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2722084. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2750660. Maximum sequence length: 2049, sample length: 3726 [default0]:Skipping sample id=2753039. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2710963. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2729045. Maximum sequence length: 2049, sample length: 3487 [default0]:Skipping sample id=2711838. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2467366. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2748108. Maximum sequence length: 2049, sample length: 4601 [default0]:Skipping sample id=2748531. Maximum sequence length: 2049, sample length: 5160 [default0]:Skipping sample id=2487206. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2737237. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2491793. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2724160. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2713978. Maximum sequence length: 2049, sample length: 5423 [default0]:Skipping sample id=2734834. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2727903. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2731881. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2722731. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2495028. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2725393. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2466398. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2480672. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2750831. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2480575. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2723735. Maximum sequence length: 2049, sample length: 3854 [default0]:Skipping sample id=2735861. Maximum sequence length: 2049, sample length: 2961 [default0]:Skipping sample id=2732833. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2737919. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2755696. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2730796. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2754697. Maximum sequence length: 2049, sample length: 3550 [default0]:Skipping sample id=2729279. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2741720. Maximum sequence length: 2049, sample length: 3062 [default0]:Skipping sample id=2723647. Maximum sequence length: 2049, sample length: 3515 [default0]:Skipping sample id=2499301. Maximum sequence length: 2049, sample length: 2908 [default0]:Skipping sample id=2493108. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2723305. Maximum sequence length: 2049, sample length: 4189 [default0]:Skipping sample id=2719282. Maximum sequence length: 2049, sample length: 2946 [default0]:Skipping sample id=2711131. Maximum sequence length: 2049, sample length: 4878 [default0]:Skipping sample id=2720810. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2744832. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2748084. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2739452. Maximum sequence length: 2049, sample length: 4060 [default0]:Skipping sample id=2727114. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2739585. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2720972. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2742004. Maximum sequence length: 2049, sample length: 3006 [default0]:Skipping sample id=2725416. Maximum sequence length: 2049, sample length: 5991 [default0]:Skipping sample id=2751185. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2723190. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2712997. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2746215. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2720765. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2718153. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2712877. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2731711. Maximum sequence length: 2049, sample length: 4628 [default0]:Skipping sample id=2738773. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2487921. Maximum sequence length: 2049, sample length: 3096 [default0]:Skipping sample id=2727511. Maximum sequence length: 2049, sample length: 2728 [default0]:Skipping sample id=2470381. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2495403. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2711271. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2753435. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2734480. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2722648. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2482299. Maximum sequence length: 2049, sample length: 3091 [default0]:Skipping sample id=2478142. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2727849. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2719623. Maximum sequence length: 2049, sample length: 4153 [default0]:Skipping sample id=2719328. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2467479. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2712594. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2722181. Maximum sequence length: 2049, sample length: 4106 [default0]:Skipping sample id=2747236. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2469413. Maximum sequence length: 2049, sample length: 3145 [default0]:Skipping sample id=2715303. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2725473. Maximum sequence length: 2049, sample length: 3441 [default0]:Skipping sample id=2756210. Maximum sequence length: 2049, sample length: 5756 [default0]:Skipping sample id=2720499. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2734987. Maximum sequence length: 2049, sample length: 3602 [default0]:Skipping sample id=2714991. Maximum sequence length: 2049, sample length: 3235 [default0]:Skipping sample id=2715638. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2484554. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2722811. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2739228. Maximum sequence length: 2049, sample length: 3003 [default0]:Skipping sample id=2730159. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2726335. Maximum sequence length: 2049, sample length: 2755 [default0]:Skipping sample id=2754930. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2738294. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2741029. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2719796. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2740079. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2718874. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2724994. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2489474. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2716677. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2467012. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2756931. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2734184. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2742613. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2745238. Maximum sequence length: 2049, sample length: 3177 [default0]:Skipping sample id=2731396. Maximum sequence length: 2049, sample length: 4883 [default0]:Skipping sample id=2738542. Maximum sequence length: 2049, sample length: 4183 [default0]:Skipping sample id=2719300. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2488560. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2717084. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2715969. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2482697. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2738198. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2730272. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2742210. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2729185. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2488770. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2719919. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2729180. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2750351. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2712560. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2747904. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2736470. Maximum sequence length: 2049, sample length: 2907 [default0]:Skipping sample id=2722062. Maximum sequence length: 2049, sample length: 4173 [default0]:Skipping sample id=2723065. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2724939. Maximum sequence length: 2049, sample length: 3703 [default0]:Skipping sample id=2716564. Maximum sequence length: 2049, sample length: 3824 [default0]:Skipping sample id=2746652. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2751934. Maximum sequence length: 2049, sample length: 4141 [default0]:Skipping sample id=2711920. Maximum sequence length: 2049, sample length: 2621 [default0]:Skipping sample id=2756335. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2469758. Maximum sequence length: 2049, sample length: 3110 [default0]:Skipping sample id=2745666. Maximum sequence length: 2049, sample length: 3505 [default0]:Skipping sample id=2749795. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2716889. Maximum sequence length: 2049, sample length: 2489 [default0]:Skipping sample id=2741889. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2734428. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2738487. Maximum sequence length: 2049, sample length: 4695 [default0]:Skipping sample id=2732525. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2721757. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2469954. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2745834. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2736576. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2751444. Maximum sequence length: 2049, sample length: 3098 [default0]:Skipping sample id=2749240. Maximum sequence length: 2049, sample length: 4751 [default0]:Skipping sample id=2730261. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2754615. Maximum sequence length: 2049, sample length: 3009 [default0]:Skipping sample id=2714499. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2752867. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2484911. Maximum sequence length: 2049, sample length: 2765 [default0]:Skipping sample id=2750613. Maximum sequence length: 2049, sample length: 5319 [default0]:Skipping sample id=2734866. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2713986. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2747821. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2746464. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2743569. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2466452. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2719637. Maximum sequence length: 2049, sample length: 4426 [default0]:Skipping sample id=2746689. Maximum sequence length: 2049, sample length: 3827 [default0]:Skipping sample id=2732281. Maximum sequence length: 2049, sample length: 2758 [default0]:Skipping sample id=2746946. Maximum sequence length: 2049, sample length: 4131 [default0]:Skipping sample id=2747383. Maximum sequence length: 2049, sample length: 4085 [default0]:Skipping sample id=2735481. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2715612. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2725540. Maximum sequence length: 2049, sample length: 2959 [default0]:Skipping sample id=2734991. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2739274. Maximum sequence length: 2049, sample length: 4375 [default0]:Skipping sample id=2737276. Maximum sequence length: 2049, sample length: 4129 [default0]:Skipping sample id=2722783. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2745770. Maximum sequence length: 2049, sample length: 3081 [default0]:Skipping sample id=2745091. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2743023. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2732951. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2732395. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2724789. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2756930. Maximum sequence length: 2049, sample length: 4421 [default0]:Skipping sample id=2717360. Maximum sequence length: 2049, sample length: 5517 [default0]:Skipping sample id=2752880. Maximum sequence length: 2049, sample length: 4227 [default0]:Skipping sample id=2716050. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2752892. Maximum sequence length: 2049, sample length: 4936 [default0]:Skipping sample id=2734910. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2741761. Maximum sequence length: 2049, sample length: 4114 [default0]:Skipping sample id=2482480. Maximum sequence length: 2049, sample length: 2586 [default0]:Skipping sample id=2746179. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2713763. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2741178. Maximum sequence length: 2049, sample length: 4867 [default0]:Skipping sample id=2745058. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2743129. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2731104. Maximum sequence length: 2049, sample length: 3597 [default0]:Skipping sample id=2733673. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2730004. Maximum sequence length: 2049, sample length: 6604 [default0]:Skipping sample id=2727225. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2741989. Maximum sequence length: 2049, sample length: 7148 [default0]:Skipping sample id=2751350. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2732755. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2712100. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2742030. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2492274. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2752118. Maximum sequence length: 2049, sample length: 3358 [default0]:Skipping sample id=2715177. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2489497. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2739100. Maximum sequence length: 2049, sample length: 5527 [default0]:Skipping sample id=2749380. Maximum sequence length: 2049, sample length: 3812 [default0]:Skipping sample id=2496805. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2714149. Maximum sequence length: 2049, sample length: 3945 [default0]:Skipping sample id=2711130. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2725143. Maximum sequence length: 2049, sample length: 3281 [default0]:Skipping sample id=2754160. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2727171. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2735525. Maximum sequence length: 2049, sample length: 3186 [default0]:Skipping sample id=2732950. Maximum sequence length: 2049, sample length: 3280 [default0]:Skipping sample id=2711582. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2748841. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2750840. Maximum sequence length: 2049, sample length: 3471 [default0]:Skipping sample id=2720445. Maximum sequence length: 2049, sample length: 4182 [default0]:Skipping sample id=2495903. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2722896. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2494338. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2713631. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2714462. Maximum sequence length: 2049, sample length: 5165 [default0]:Skipping sample id=2733639. Maximum sequence length: 2049, sample length: 3445 [default0]:Skipping sample id=2752152. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2736369. Maximum sequence length: 2049, sample length: 3996 [default0]:Skipping sample id=2719324. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2728530. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2747153. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2718189. Maximum sequence length: 2049, sample length: 3832 [default0]:Skipping sample id=2756207. Maximum sequence length: 2049, sample length: 2871 [default0]:Skipping sample id=2731278. Maximum sequence length: 2049, sample length: 3678 [default0]:Skipping sample id=2738632. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2747578. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2487399. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2485662. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2488248. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2730197. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2498540. Maximum sequence length: 2049, sample length: 2755 [default0]:Skipping sample id=2727661. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2735015. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2484294. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2750952. Maximum sequence length: 2049, sample length: 3662 [default0]:Skipping sample id=2716142. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2716491. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2715092. Maximum sequence length: 2049, sample length: 6265 [default0]:Skipping sample id=2728979. Maximum sequence length: 2049, sample length: 3248 [default0]:Skipping sample id=2735440. Maximum sequence length: 2049, sample length: 4587 [default0]:Skipping sample id=2477860. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2721804. Maximum sequence length: 2049, sample length: 3604 [default0]:Skipping sample id=2730277. Maximum sequence length: 2049, sample length: 3083 [default0]:Skipping sample id=2722245. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2731479. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2714452. Maximum sequence length: 2049, sample length: 6638 [default0]:Skipping sample id=2751229. Maximum sequence length: 2049, sample length: 2784 [default0]:Skipping sample id=2734244. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2716070. Maximum sequence length: 2049, sample length: 3266 [default0]:Skipping sample id=2740641. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2738053. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2739215. Maximum sequence length: 2049, sample length: 3733 [default0]:Skipping sample id=2713667. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2725435. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2495315. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2713086. Maximum sequence length: 2049, sample length: 3681 [default0]:Skipping sample id=2712053. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2724692. Maximum sequence length: 2049, sample length: 2717 [default0]:Skipping sample id=2490088. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2724156. Maximum sequence length: 2049, sample length: 2558 [default0]:Skipping sample id=2743578. Maximum sequence length: 2049, sample length: 3135 [default0]:Skipping sample id=2745212. Maximum sequence length: 2049, sample length: 3618 [default0]:Skipping sample id=2719555. Maximum sequence length: 2049, sample length: 3531 [default0]:Skipping sample id=2743685. Maximum sequence length: 2049, sample length: 5616 [default0]:Skipping sample id=2728922. Maximum sequence length: 2049, sample length: 4158 [default0]:Skipping sample id=2720251. Maximum sequence length: 2049, sample length: 3526 [default0]:Skipping sample id=2736250. Maximum sequence length: 2049, sample length: 3951 [default0]:Skipping sample id=2723155. Maximum sequence length: 2049, sample length: 4256 [default0]:Skipping sample id=2725534. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2725115. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2731807. Maximum sequence length: 2049, sample length: 3270 [default0]:Skipping sample id=2754602. Maximum sequence length: 2049, sample length: 3738 [default0]:Skipping sample id=2755814. Maximum sequence length: 2049, sample length: 3511 [default0]:Skipping sample id=2720561. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2717195. Maximum sequence length: 2049, sample length: 6438 [default0]:Skipping sample id=2747757. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2738016. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2720518. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2718143. Maximum sequence length: 2049, sample length: 3408 [default0]:Skipping sample id=2725948. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2754361. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2729296. Maximum sequence length: 2049, sample length: 3602 [default0]:Skipping sample id=2728015. Maximum sequence length: 2049, sample length: 5347 [default0]:Skipping sample id=2755684. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2740181. Maximum sequence length: 2049, sample length: 3306 [default0]:Skipping sample id=2713323. Maximum sequence length: 2049, sample length: 6328 [default0]:Skipping sample id=2712489. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2716014. Maximum sequence length: 2049, sample length: 3946 [default0]:Skipping sample id=2748658. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2719721. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2721481. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2497302. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2722926. Maximum sequence length: 2049, sample length: 3722 [default0]:Skipping sample id=2714778. Maximum sequence length: 2049, sample length: 3254 [default0]:Skipping sample id=2727265. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2736095. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2713525. Maximum sequence length: 2049, sample length: 2553 [default0]:Skipping sample id=2716415. Maximum sequence length: 2049, sample length: 4514 [default0]:Skipping sample id=2743563. Maximum sequence length: 2049, sample length: 3935 [default0]:Skipping sample id=2737717. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2484158. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2712295. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2755963. Maximum sequence length: 2049, sample length: 3267 [default0]:Skipping sample id=2749815. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2714858. Maximum sequence length: 2049, sample length: 3098 [default0]:Skipping sample id=2724676. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2722286. Maximum sequence length: 2049, sample length: 3178 [default0]:Skipping sample id=2727635. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2467535. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2742058. Maximum sequence length: 2049, sample length: 4909 [default0]:Skipping sample id=2727960. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2727728. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2486910. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2478129. Maximum sequence length: 2049, sample length: 3608 [default0]:Skipping sample id=2733480. Maximum sequence length: 2049, sample length: 3813 [default0]:Skipping sample id=2740016. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2728927. Maximum sequence length: 2049, sample length: 2982 [default0]:Skipping sample id=2743039. Maximum sequence length: 2049, sample length: 3363 [default0]:Skipping sample id=2479090. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2736718. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2752955. Maximum sequence length: 2049, sample length: 3954 [default0]:Skipping sample id=2742716. Maximum sequence length: 2049, sample length: 3115 [default0]:Skipping sample id=2717302. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2749611. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2742959. Maximum sequence length: 2049, sample length: 3151 [default0]:Skipping sample id=2748379. Maximum sequence length: 2049, sample length: 4862 [default0]:Skipping sample id=2738705. Maximum sequence length: 2049, sample length: 3729 [default0]:Skipping sample id=2733724. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2731380. Maximum sequence length: 2049, sample length: 4278 [default0]:Skipping sample id=2498450. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2487540. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2727325. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2734777. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2713245. Maximum sequence length: 2049, sample length: 3534 [default0]:Skipping sample id=2490069. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2743287. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2742432. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2726985. Maximum sequence length: 2049, sample length: 3469 [default0]:Skipping sample id=2715311. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2747953. Maximum sequence length: 2049, sample length: 2842 [default0]:Skipping sample id=2745531. Maximum sequence length: 2049, sample length: 3717 [default0]:Skipping sample id=2735507. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2468031. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2728050. Maximum sequence length: 2049, sample length: 3820 [default0]:Skipping sample id=2732674. Maximum sequence length: 2049, sample length: 3796 [default0]:Skipping sample id=2751902. Maximum sequence length: 2049, sample length: 3562 [default0]:Skipping sample id=2727226. Maximum sequence length: 2049, sample length: 2841 [default0]:Skipping sample id=2731144. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2498612. Maximum sequence length: 2049, sample length: 2532 [default0]:Skipping sample id=2730867. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2736715. Maximum sequence length: 2049, sample length: 2502 [default0]:Skipping sample id=2721557. Maximum sequence length: 2049, sample length: 2975 [default0]:Skipping sample id=2477316. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2745896. Maximum sequence length: 2049, sample length: 3440 [default0]:Skipping sample id=2742832. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2722377. Maximum sequence length: 2049, sample length: 2766 [default0]:Skipping sample id=2711730. Maximum sequence length: 2049, sample length: 5312 [default0]:Skipping sample id=2736346. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2712663. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2730447. Maximum sequence length: 2049, sample length: 2531 [default0]:Skipping sample id=2719567. Maximum sequence length: 2049, sample length: 3802 [default0]:Skipping sample id=2754980. Maximum sequence length: 2049, sample length: 3635 [default0]:Skipping sample id=2723395. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2715215. Maximum sequence length: 2049, sample length: 3434 [default0]:Skipping sample id=2729311. Maximum sequence length: 2049, sample length: 3495 [default0]:Skipping sample id=2485296. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2717602. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2711843. Maximum sequence length: 2049, sample length: 3775 [default0]:Skipping sample id=2718271. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2490458. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2732943. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2722629. Maximum sequence length: 2049, sample length: 2889 [default0]:Skipping sample id=2733989. Maximum sequence length: 2049, sample length: 3327 [default0]:Skipping sample id=2736239. Maximum sequence length: 2049, sample length: 8151 [default0]:Skipping sample id=2715293. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2749727. Maximum sequence length: 2049, sample length: 2705 [default0]:Skipping sample id=2726947. Maximum sequence length: 2049, sample length: 4957 [default0]:Skipping sample id=2729823. Maximum sequence length: 2049, sample length: 4709 [default0]:Skipping sample id=2483148. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2752872. Maximum sequence length: 2049, sample length: 2878 [default0]:Skipping sample id=2725188. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2753046. Maximum sequence length: 2049, sample length: 3334 [default0]:Skipping sample id=2730095. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2745298. Maximum sequence length: 2049, sample length: 3492 [default0]:Skipping sample id=2711448. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2733581. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2481280. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2719079. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2755947. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2754856. Maximum sequence length: 2049, sample length: 2703 [default0]:Skipping sample id=2729826. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2741745. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2498930. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2754811. Maximum sequence length: 2049, sample length: 5027 [default0]:Skipping sample id=2753031. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2738531. Maximum sequence length: 2049, sample length: 3507 [default0]:Skipping sample id=2746363. Maximum sequence length: 2049, sample length: 4525 [default0]:Skipping sample id=2736399. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2491368. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2727175. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2736102. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2750276. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2715861. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2751940. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2743250. Maximum sequence length: 2049, sample length: 4855 [default0]:Skipping sample id=2478535. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2733964. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2755508. Maximum sequence length: 2049, sample length: 3637 [default0]:Skipping sample id=2736525. Maximum sequence length: 2049, sample length: 4883 [default0]:Skipping sample id=2718088. Maximum sequence length: 2049, sample length: 3745 [default0]:Skipping sample id=2722174. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2743565. Maximum sequence length: 2049, sample length: 2654 [default0]:Skipping sample id=2750064. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2756272. Maximum sequence length: 2049, sample length: 2639 [default0]:Skipping sample id=2738375. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2721696. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2736233. Maximum sequence length: 2049, sample length: 3597 [default0]:Skipping sample id=2740918. Maximum sequence length: 2049, sample length: 3168 [default0]:Skipping sample id=2734344. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2730564. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2744824. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2720117. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2727585. Maximum sequence length: 2049, sample length: 2924 [default0]:Skipping sample id=2729066. Maximum sequence length: 2049, sample length: 3136 [default0]:Skipping sample id=2495498. Maximum sequence length: 2049, sample length: 2829 [default0]:Skipping sample id=2753090. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2732198. Maximum sequence length: 2049, sample length: 3724 [default0]:Skipping sample id=2753943. Maximum sequence length: 2049, sample length: 5621 [default0]:Skipping sample id=2728660. Maximum sequence length: 2049, sample length: 4085 [default0]:Skipping sample id=2739390. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2479028. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2749304. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2711507. Maximum sequence length: 2049, sample length: 4691 [default0]:Skipping sample id=2736806. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2737992. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2722602. Maximum sequence length: 2049, sample length: 4926 [default0]:Skipping sample id=2730422. Maximum sequence length: 2049, sample length: 3229 [default0]:Skipping sample id=2744307. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2719021. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2482269. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2745977. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2725850. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2712426. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2750748. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2733951. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2756205. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2744523. Maximum sequence length: 2049, sample length: 3381 [default0]:Skipping sample id=2756781. Maximum sequence length: 2049, sample length: 3255 [default0]:Skipping sample id=2751512. Maximum sequence length: 2049, sample length: 3627 [default0]:Skipping sample id=2716336. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2744163. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2714626. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2711075. Maximum sequence length: 2049, sample length: 4782 [default0]:Skipping sample id=2487988. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2478051. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2756666. Maximum sequence length: 2049, sample length: 4126 [default0]:Skipping sample id=2720606. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2742846. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2745195. Maximum sequence length: 2049, sample length: 3363 [default0]:Skipping sample id=2718823. Maximum sequence length: 2049, sample length: 3032 [default0]:Skipping sample id=2751219. Maximum sequence length: 2049, sample length: 4010 [default0]:Skipping sample id=2747945. Maximum sequence length: 2049, sample length: 3698 [default0]:Skipping sample id=2487010. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2740576. Maximum sequence length: 2049, sample length: 3261 [default0]:Skipping sample id=2712921. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2756305. Maximum sequence length: 2049, sample length: 3174 [default0]:Skipping sample id=2726320. Maximum sequence length: 2049, sample length: 3198 [default0]:Skipping sample id=2716891. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2481492. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2727044. Maximum sequence length: 2049, sample length: 2876 [default0]:Skipping sample id=2747275. Maximum sequence length: 2049, sample length: 4628 [default0]:Skipping sample id=2717322. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2746849. Maximum sequence length: 2049, sample length: 3063 [default0]:Skipping sample id=2755369. Maximum sequence length: 2049, sample length: 5412 [default0]:Skipping sample id=2489289. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2744714. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2722543. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2716472. Maximum sequence length: 2049, sample length: 3845 [default0]:Skipping sample id=2743322. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2731683. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2728728. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2712844. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2734595. Maximum sequence length: 2049, sample length: 3653 [default0]:Skipping sample id=2746567. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2740325. Maximum sequence length: 2049, sample length: 3800 [default0]:Skipping sample id=2729166. Maximum sequence length: 2049, sample length: 5111 [default0]:Skipping sample id=2748118. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2713922. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2755319. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2490435. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2736166. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2731151. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2711740. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2754347. Maximum sequence length: 2049, sample length: 3006 [default0]:Skipping sample id=2731721. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2748220. Maximum sequence length: 2049, sample length: 2555 [default0]:Skipping sample id=2755666. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2740271. Maximum sequence length: 2049, sample length: 2994 [default0]:Skipping sample id=2732131. Maximum sequence length: 2049, sample length: 3997 [default0]:Skipping sample id=2745811. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2742260. Maximum sequence length: 2049, sample length: 5012 [default0]:Skipping sample id=2468481. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2730545. Maximum sequence length: 2049, sample length: 5974 [default0]:Skipping sample id=2733032. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2498247. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2734885. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2747469. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2735381. Maximum sequence length: 2049, sample length: 3870 [default0]:Skipping sample id=2482683. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2754720. Maximum sequence length: 2049, sample length: 2587 [default0]:Skipping sample id=2490797. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2711623. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2714932. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2728042. Maximum sequence length: 2049, sample length: 2846 [default0]:Skipping sample id=2476967. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2742795. Maximum sequence length: 2049, sample length: 3547 [default0]:Skipping sample id=2728053. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2756041. Maximum sequence length: 2049, sample length: 3081 [default0]:Skipping sample id=2479929. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2748274. Maximum sequence length: 2049, sample length: 2737 [default0]:Skipping sample id=2741796. Maximum sequence length: 2049, sample length: 2642 [default0]:Skipping sample id=2728091. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2493227. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2729628. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2723945. Maximum sequence length: 2049, sample length: 3019 [default0]:Skipping sample id=2467838. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2718272. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2749591. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2482895. Maximum sequence length: 2049, sample length: 3076 [default0]:Skipping sample id=2741875. Maximum sequence length: 2049, sample length: 4783 [default0]:Skipping sample id=2738119. Maximum sequence length: 2049, sample length: 2734 [default0]:Skipping sample id=2712992. Maximum sequence length: 2049, sample length: 5381 [default0]:Skipping sample id=2725068. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2484050. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2732434. Maximum sequence length: 2049, sample length: 3871 [default0]:Skipping sample id=2716494. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2713157. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2712243. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2468581. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2736651. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2752195. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2749814. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2485574. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2739093. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2724697. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2718854. Maximum sequence length: 2049, sample length: 3687 [default0]:Skipping sample id=2479049. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2729095. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2712450. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2723788. Maximum sequence length: 2049, sample length: 3086 [default0]:Skipping sample id=2730423. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2712286. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2740448. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2717618. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2754233. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2756575. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2712655. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2483433. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2730113. Maximum sequence length: 2049, sample length: 4283 [default0]:Skipping sample id=2739738. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2734273. Maximum sequence length: 2049, sample length: 5573 [default0]:Skipping sample id=2747527. Maximum sequence length: 2049, sample length: 3438 [default0]:Skipping sample id=2484880. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2746782. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2726382. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2746503. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2743679. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2726022. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2727386. Maximum sequence length: 2049, sample length: 3561 [default0]:Skipping sample id=2499400. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2745044. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2745735. Maximum sequence length: 2049, sample length: 4593 [default0]:Skipping sample id=2714072. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2750793. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2743348. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2741677. Maximum sequence length: 2049, sample length: 3905 [default0]:Skipping sample id=2747782. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2483538. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2733144. Maximum sequence length: 2049, sample length: 5347 [default0]:Skipping sample id=2718821. Maximum sequence length: 2049, sample length: 5125 [default0]:Skipping sample id=2724123. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2722569. Maximum sequence length: 2049, sample length: 2986 [default0]:Skipping sample id=2747615. Maximum sequence length: 2049, sample length: 2400 [default0]:Skipping sample id=2731949. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2740063. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2716135. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2489244. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2747655. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2725645. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2747478. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2726740. Maximum sequence length: 2049, sample length: 3858 [default0]:Skipping sample id=2488721. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2719605. Maximum sequence length: 2049, sample length: 3856 [default0]:Skipping sample id=2499316. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2735940. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2724732. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2730732. Maximum sequence length: 2049, sample length: 2961 [default0]:Skipping sample id=2722734. Maximum sequence length: 2049, sample length: 5093 [default0]:Skipping sample id=2727146. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2731056. Maximum sequence length: 2049, sample length: 3259 [default0]:Skipping sample id=2711712. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2723386. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2725612. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2757060. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2496242. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2729392. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2753509. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2488774. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2748249. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2484721. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2717060. Maximum sequence length: 2049, sample length: 3757 [default0]:Skipping sample id=2719999. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2498139. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2489853. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2716296. Maximum sequence length: 2049, sample length: 2406 [default0]:Skipping sample id=2713731. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2745723. Maximum sequence length: 2049, sample length: 2934 [default0]:Skipping sample id=2483956. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2497897. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2752184. Maximum sequence length: 2049, sample length: 3105 [default0]:Skipping sample id=2719741. Maximum sequence length: 2049, sample length: 3400 [default0]:Skipping sample id=2467688. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2729075. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2745218. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2755810. Maximum sequence length: 2049, sample length: 3201 [default0]:Skipping sample id=2724650. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2722101. Maximum sequence length: 2049, sample length: 3782 [default0]:Skipping sample id=2723609. Maximum sequence length: 2049, sample length: 3712 [default0]:Skipping sample id=2482286. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2734519. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2723493. Maximum sequence length: 2049, sample length: 4448 [default0]:Skipping sample id=2752035. Maximum sequence length: 2049, sample length: 3557 [default0]:Skipping sample id=2751911. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2478219. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2735020. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2754194. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2732850. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2724349. Maximum sequence length: 2049, sample length: 5320 [default0]:Skipping sample id=2731276. Maximum sequence length: 2049, sample length: 3867 [default0]:Skipping sample id=2477435. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2748376. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2487660. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2734629. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2748063. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2748114. Maximum sequence length: 2049, sample length: 5179 [default0]:Skipping sample id=2738726. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2488376. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2743232. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2712567. Maximum sequence length: 2049, sample length: 3469 [default0]:Skipping sample id=2468401. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2730633. Maximum sequence length: 2049, sample length: 2555 [default0]:Skipping sample id=2481311. Maximum sequence length: 2049, sample length: 3670 [default0]:Skipping sample id=2717753. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2740237. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2749648. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2736025. Maximum sequence length: 2049, sample length: 5153 [default0]:Skipping sample id=2739703. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2755217. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2736355. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2727652. Maximum sequence length: 2049, sample length: 3572 [default0]:Skipping sample id=2757076. Maximum sequence length: 2049, sample length: 3343 [default0]:Skipping sample id=2735150. Maximum sequence length: 2049, sample length: 3091 [default0]:Skipping sample id=2722653. Maximum sequence length: 2049, sample length: 3291 [default0]:Skipping sample id=2734776. Maximum sequence length: 2049, sample length: 3483 [default0]:Skipping sample id=2719868. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2731067. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2477035. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2756659. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2745851. Maximum sequence length: 2049, sample length: 7555 [default0]:Skipping sample id=2741637. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2466820. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2739758. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2751897. Maximum sequence length: 2049, sample length: 3387 [default0]:Skipping sample id=2732379. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2715388. Maximum sequence length: 2049, sample length: 6431 [default0]:Skipping sample id=2735835. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2717403. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2727116. Maximum sequence length: 2049, sample length: 3497 [default0]:Skipping sample id=2745702. Maximum sequence length: 2049, sample length: 2755 [default0]:Skipping sample id=2485706. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2754711. Maximum sequence length: 2049, sample length: 3042 [default0]:Skipping sample id=2724747. Maximum sequence length: 2049, sample length: 7624 [default0]:Skipping sample id=2729817. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2746521. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2717687. Maximum sequence length: 2049, sample length: 4334 [default0]:Skipping sample id=2754719. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2738161. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2728191. Maximum sequence length: 2049, sample length: 3208 [default0]:Skipping sample id=2738103. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2466883. Maximum sequence length: 2049, sample length: 2553 [default0]:Skipping sample id=2490074. Maximum sequence length: 2049, sample length: 2879 [default0]:Skipping sample id=2719800. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2752920. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2738960. Maximum sequence length: 2049, sample length: 2999 [default0]:Skipping sample id=2477075. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2478504. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2743652. Maximum sequence length: 2049, sample length: 3024 [default0]:Skipping sample id=2716958. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2468133. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2736367. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2720021. Maximum sequence length: 2049, sample length: 3589 [default0]:Skipping sample id=2723760. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2753969. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2732628. Maximum sequence length: 2049, sample length: 5133 [default0]:Skipping sample id=2711715. Maximum sequence length: 2049, sample length: 3243 [default0]:Skipping sample id=2713441. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2740704. Maximum sequence length: 2049, sample length: 4958 [default0]:Skipping sample id=2729758. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2735993. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2724912. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2726869. Maximum sequence length: 2049, sample length: 3452 [default0]:Skipping sample id=2718035. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2736761. Maximum sequence length: 2049, sample length: 3828 [default0]:Skipping sample id=2749897. Maximum sequence length: 2049, sample length: 8478 [default0]:Skipping sample id=2726441. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2715240. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2713412. Maximum sequence length: 2049, sample length: 3026 [default0]:Skipping sample id=2746413. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2753235. Maximum sequence length: 2049, sample length: 5808 [default0]:Skipping sample id=2721137. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2751038. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2711826. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2743069. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2732550. Maximum sequence length: 2049, sample length: 7261 [default0]:Skipping sample id=2726720. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2497207. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2748263. Maximum sequence length: 2049, sample length: 4872 [default0]:Skipping sample id=2750597. Maximum sequence length: 2049, sample length: 3211 [default0]:Skipping sample id=2715643. Maximum sequence length: 2049, sample length: 3426 [default0]:Skipping sample id=2726925. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2756712. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2735242. Maximum sequence length: 2049, sample length: 4283 [default0]:Skipping sample id=2717187. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2730466. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2488815. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2492993. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2735528. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2725025. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2713319. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2495710. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2732973. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2746431. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2711354. Maximum sequence length: 2049, sample length: 3798 [default0]:Skipping sample id=2726559. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2740122. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2737811. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2741638. Maximum sequence length: 2049, sample length: 4120 [default0]:Skipping sample id=2748900. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2483094. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2720380. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2730068. Maximum sequence length: 2049, sample length: 3686 [default0]:Skipping sample id=2749681. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2731600. Maximum sequence length: 2049, sample length: 4059 [default0]:Skipping sample id=2483948. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2734263. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2750888. Maximum sequence length: 2049, sample length: 3280 [default0]:Skipping sample id=2711203. Maximum sequence length: 2049, sample length: 3509 [default0]:Skipping sample id=2751292. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2742392. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2484724. Maximum sequence length: 2049, sample length: 3471 [default0]:Skipping sample id=2746824. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2737759. Maximum sequence length: 2049, sample length: 5315 [default0]:Skipping sample id=2739593. Maximum sequence length: 2049, sample length: 3628 [default0]:Skipping sample id=2715269. Maximum sequence length: 2049, sample length: 3844 [default0]:Skipping sample id=2740074. Maximum sequence length: 2049, sample length: 6239 [default0]:Skipping sample id=2732062. Maximum sequence length: 2049, sample length: 3587 [default0]:Skipping sample id=2730372. Maximum sequence length: 2049, sample length: 3437 [default0]:Skipping sample id=2740937. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2738996. Maximum sequence length: 2049, sample length: 3575 [default0]:Skipping sample id=2740427. Maximum sequence length: 2049, sample length: 3504 [default0]:Skipping sample id=2722359. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2725604. Maximum sequence length: 2049, sample length: 2728 [default0]:Skipping sample id=2713313. Maximum sequence length: 2049, sample length: 3732 [default0]:Skipping sample id=2750503. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2750144. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2487894. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2739439. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2743118. Maximum sequence length: 2049, sample length: 4687 [default0]:Skipping sample id=2756492. Maximum sequence length: 2049, sample length: 4086 [default0]:Skipping sample id=2749753. Maximum sequence length: 2049, sample length: 2965 [default0]:Skipping sample id=2494992. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2741288. Maximum sequence length: 2049, sample length: 3262 [default0]:Skipping sample id=2740908. Maximum sequence length: 2049, sample length: 3517 [default0]:Skipping sample id=2748099. Maximum sequence length: 2049, sample length: 2982 [default0]:Skipping sample id=2735729. Maximum sequence length: 2049, sample length: 3823 [default0]:Skipping sample id=2735594. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2744642. Maximum sequence length: 2049, sample length: 2941 [default0]:Skipping sample id=2723755. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2468264. Maximum sequence length: 2049, sample length: 3548 [default0]:Skipping sample id=2729813. Maximum sequence length: 2049, sample length: 3521 [default0]:Skipping sample id=2724025. Maximum sequence length: 2049, sample length: 2902 [default0]:Skipping sample id=2712755. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2726879. Maximum sequence length: 2049, sample length: 4036 [default0]:Skipping sample id=2736765. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2755654. Maximum sequence length: 2049, sample length: 4403 [default0]:Skipping sample id=2714480. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2481277. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2744001. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2724513. Maximum sequence length: 2049, sample length: 4308 [default0]:Skipping sample id=2750629. Maximum sequence length: 2049, sample length: 4232 [default0]:Skipping sample id=2477180. Maximum sequence length: 2049, sample length: 3596 [default0]:Skipping sample id=2725162. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2744240. Maximum sequence length: 2049, sample length: 3601 [default0]:Skipping sample id=2715647. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2728051. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2722113. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2499346. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2751473. Maximum sequence length: 2049, sample length: 3816 [default0]:Skipping sample id=2742634. Maximum sequence length: 2049, sample length: 4931 [default0]:Skipping sample id=2746100. Maximum sequence length: 2049, sample length: 2777 [default0]:Skipping sample id=2739353. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2721583. Maximum sequence length: 2049, sample length: 4445 [default0]:Skipping sample id=2735864. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2483361. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2747159. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2738968. Maximum sequence length: 2049, sample length: 6158 [default0]:Skipping sample id=2486127. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2712730. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2736958. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2757004. Maximum sequence length: 2049, sample length: 2679 [default0]:Skipping sample id=2728702. Maximum sequence length: 2049, sample length: 3154 [default0]:Skipping sample id=2717463. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2733705. Maximum sequence length: 2049, sample length: 6481 [default0]:Skipping sample id=2748714. Maximum sequence length: 2049, sample length: 2852 [default0]:Skipping sample id=2494190. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2493900. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2711860. Maximum sequence length: 2049, sample length: 5135 [default0]:Skipping sample id=2736627. Maximum sequence length: 2049, sample length: 3316 [default0]:Skipping sample id=2718860. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2740848. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2736452. Maximum sequence length: 2049, sample length: 4552 [default0]:Skipping sample id=2718028. Maximum sequence length: 2049, sample length: 4326 [default0]:Skipping sample id=2714471. Maximum sequence length: 2049, sample length: 3613 [default0]:Skipping sample id=2741077. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2470567. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2749994. Maximum sequence length: 2049, sample length: 5083 [default0]:Skipping sample id=2728275. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2715119. Maximum sequence length: 2049, sample length: 5610 [default0]:Skipping sample id=2734048. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2713674. Maximum sequence length: 2049, sample length: 6810 [default0]:Skipping sample id=2741856. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2744873. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2755494. Maximum sequence length: 2049, sample length: 5055 [default0]:Skipping sample id=2751778. Maximum sequence length: 2049, sample length: 2602 [default0]:Skipping sample id=2720207. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2751793. Maximum sequence length: 2049, sample length: 3373 [default0]:Skipping sample id=2756091. Maximum sequence length: 2049, sample length: 4964 [default0]:Skipping sample id=2725293. Maximum sequence length: 2049, sample length: 3246 [default0]:Skipping sample id=2756276. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2731447. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2494601. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2716697. Maximum sequence length: 2049, sample length: 3870 [default0]:Skipping sample id=2731306. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2756778. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2726246. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2718106. Maximum sequence length: 2049, sample length: 3279 [default0]:Skipping sample id=2715496. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2710959. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2744677. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2485953. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2720558. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2736890. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2727649. Maximum sequence length: 2049, sample length: 4251 [default0]:Skipping sample id=2731598. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2727707. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2724572. Maximum sequence length: 2049, sample length: 2777 [default0]:Skipping sample id=2746655. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2721882. Maximum sequence length: 2049, sample length: 3687 [default0]:Skipping sample id=2470123. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2747917. Maximum sequence length: 2049, sample length: 3335 [default0]:Skipping sample id=2734466. Maximum sequence length: 2049, sample length: 2671 [default0]:Skipping sample id=2718126. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2734241. Maximum sequence length: 2049, sample length: 3677 [default0]:Skipping sample id=2744340. Maximum sequence length: 2049, sample length: 2929 [default0]:Skipping sample id=2734984. Maximum sequence length: 2049, sample length: 3641 [default0]:Skipping sample id=2744674. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2712858. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2737856. Maximum sequence length: 2049, sample length: 5171 [default0]:Skipping sample id=2735571. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2755568. Maximum sequence length: 2049, sample length: 3312 [default0]:Skipping sample id=2726952. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2717559. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2719228. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2743915. Maximum sequence length: 2049, sample length: 2853 [default0]:Skipping sample id=2729663. Maximum sequence length: 2049, sample length: 3258 [default0]:Skipping sample id=2713940. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2723454. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2483313. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2742606. Maximum sequence length: 2049, sample length: 5458 [default0]:Skipping sample id=2738456. Maximum sequence length: 2049, sample length: 2843 [default0]:Skipping sample id=2730212. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2731746. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2728920. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2730143. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2753755. Maximum sequence length: 2049, sample length: 2555 [default0]:Skipping sample id=2738170. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2711403. Maximum sequence length: 2049, sample length: 4430 [default0]:Skipping sample id=2749595. Maximum sequence length: 2049, sample length: 4927 [default0]:Skipping sample id=2739301. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2725799. Maximum sequence length: 2049, sample length: 3325 [default0]:Skipping sample id=2737889. Maximum sequence length: 2049, sample length: 2826 [default0]:Skipping sample id=2744503. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2713110. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2740353. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2471026. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2728000. Maximum sequence length: 2049, sample length: 4553 [default0]:Skipping sample id=2751620. Maximum sequence length: 2049, sample length: 3479 [default0]:Skipping sample id=2480809. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2755251. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2746686. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2738845. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2725671. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2740032. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2735447. Maximum sequence length: 2049, sample length: 3426 [default0]:Skipping sample id=2742808. Maximum sequence length: 2049, sample length: 7067 [default0]:Skipping sample id=2711054. Maximum sequence length: 2049, sample length: 3173 [default0]:Skipping sample id=2726220. Maximum sequence length: 2049, sample length: 3256 [default0]:Skipping sample id=2735936. Maximum sequence length: 2049, sample length: 4591 [default0]:Skipping sample id=2742128. Maximum sequence length: 2049, sample length: 3653 [default0]:Skipping sample id=2467389. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2478439. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2726561. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2735213. Maximum sequence length: 2049, sample length: 4561 [default0]:Skipping sample id=2715567. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2723058. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2725932. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2742439. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2755819. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2734299. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2725379. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2723518. Maximum sequence length: 2049, sample length: 3589 [default0]:Skipping sample id=2478844. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2712819. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2483635. Maximum sequence length: 2049, sample length: 3266 [default0]:Skipping sample id=2737050. Maximum sequence length: 2049, sample length: 5829 [default0]:Skipping sample id=2735899. Maximum sequence length: 2049, sample length: 3585 [default0]:Skipping sample id=2712621. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2722902. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2720390. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2715319. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2726820. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2738595. Maximum sequence length: 2049, sample length: 5270 [default0]:Skipping sample id=2741919. Maximum sequence length: 2049, sample length: 5045 [default0]:Skipping sample id=2747910. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2748217. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2757031. Maximum sequence length: 2049, sample length: 3559 [default0]:Skipping sample id=2740460. Maximum sequence length: 2049, sample length: 2838 [default0]:Skipping sample id=2713497. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2736705. Maximum sequence length: 2049, sample length: 4642 [default0]:Skipping sample id=2756829. Maximum sequence length: 2049, sample length: 3496 [default0]:Skipping sample id=2720100. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2717843. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2745162. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2735989. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2743005. Maximum sequence length: 2049, sample length: 5327 [default0]:Skipping sample id=2732569. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2714485. Maximum sequence length: 2049, sample length: 4424 [default0]:Skipping sample id=2716781. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2466430. Maximum sequence length: 2049, sample length: 2849 [default0]:Skipping sample id=2745111. Maximum sequence length: 2049, sample length: 5406 [default0]:Skipping sample id=2714928. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2740387. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2466886. Maximum sequence length: 2049, sample length: 3612 [default0]:Skipping sample id=2750022. Maximum sequence length: 2049, sample length: 8128 [default0]:Skipping sample id=2751782. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2715708. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2751851. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2718973. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2756322. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2739325. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2754405. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2746454. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2750682. Maximum sequence length: 2049, sample length: 4542 [default0]:Skipping sample id=2732887. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2729246. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2737804. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2737151. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2713401. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2749762. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2730947. Maximum sequence length: 2049, sample length: 2656 [default0]:Skipping sample id=2729581. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2714489. Maximum sequence length: 2049, sample length: 7617 [default0]:Skipping sample id=2732074. Maximum sequence length: 2049, sample length: 4708 [default0]:Skipping sample id=2739625. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2740316. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2494251. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2715849. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2728957. Maximum sequence length: 2049, sample length: 3968 [default0]:Skipping sample id=2715679. Maximum sequence length: 2049, sample length: 3056 [default0]:Skipping sample id=2736643. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2748709. Maximum sequence length: 2049, sample length: 4025 [default0]:Skipping sample id=2477874. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2469788. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2487102. Maximum sequence length: 2049, sample length: 2703 [default0]:Skipping sample id=2721960. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2754050. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2731436. Maximum sequence length: 2049, sample length: 2907 [default0]:Skipping sample id=2756102. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2713182. Maximum sequence length: 2049, sample length: 4030 [default0]:Skipping sample id=2727516. Maximum sequence length: 2049, sample length: 2683 [default0]:Skipping sample id=2489487. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2720985. Maximum sequence length: 2049, sample length: 3434 [default0]:Skipping sample id=2747695. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2717592. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2715420. Maximum sequence length: 2049, sample length: 4601 [default0]:Skipping sample id=2741335. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2724228. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2747216. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2714163. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2713904. Maximum sequence length: 2049, sample length: 3041 [default0]:Skipping sample id=2716752. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2717519. Maximum sequence length: 2049, sample length: 2687 [default0]:Skipping sample id=2727478. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2486653. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2741524. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2468572. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2723587. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2733374. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2731424. Maximum sequence length: 2049, sample length: 3292 [default0]:Skipping sample id=2727303. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2739716. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2741753. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2490502. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2733168. Maximum sequence length: 2049, sample length: 4792 [default0]:Skipping sample id=2495398. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2746969. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2748306. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2739337. Maximum sequence length: 2049, sample length: 4920 [default0]:Skipping sample id=2751671. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2714605. Maximum sequence length: 2049, sample length: 3084 [default0]:Skipping sample id=2713060. Maximum sequence length: 2049, sample length: 3832 [default0]:Skipping sample id=2720044. Maximum sequence length: 2049, sample length: 2708 [default0]:Skipping sample id=2719573. Maximum sequence length: 2049, sample length: 3021 [default0]:Skipping sample id=2752372. Maximum sequence length: 2049, sample length: 6493 [default0]:Skipping sample id=2741690. Maximum sequence length: 2049, sample length: 4689 [default0]:Skipping sample id=2752313. Maximum sequence length: 2049, sample length: 2468 [default0]:Skipping sample id=2742652. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2714119. Maximum sequence length: 2049, sample length: 8234 [default0]:Skipping sample id=2733344. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2751055. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2717224. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2718361. Maximum sequence length: 2049, sample length: 2905 [default0]:Skipping sample id=2749398. Maximum sequence length: 2049, sample length: 3547 [default0]:Skipping sample id=2755738. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2751143. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2751877. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2755457. Maximum sequence length: 2049, sample length: 2767 [default0]:Skipping sample id=2484091. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2748027. Maximum sequence length: 2049, sample length: 2562 [default0]:Skipping sample id=2726172. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2718177. Maximum sequence length: 2049, sample length: 3143 [default0]:Skipping sample id=2713208. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2736859. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2731510. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2749409. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2722036. Maximum sequence length: 2049, sample length: 2471 [default0]:Skipping sample id=2713885. Maximum sequence length: 2049, sample length: 4152 [default0]:Skipping sample id=2737628. Maximum sequence length: 2049, sample length: 2565 [default0]:Skipping sample id=2712644. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2745454. Maximum sequence length: 2049, sample length: 3801 [default0]:Skipping sample id=2754670. Maximum sequence length: 2049, sample length: 2802 [default0]:Skipping sample id=2732191. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2739706. Maximum sequence length: 2049, sample length: 3775 [default0]:Skipping sample id=2490565. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2491827. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2723731. Maximum sequence length: 2049, sample length: 3344 [default0]:Skipping sample id=2743286. Maximum sequence length: 2049, sample length: 4271 [default0]:Skipping sample id=2714890. Maximum sequence length: 2049, sample length: 3093 [default0]:Skipping sample id=2730811. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2752713. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2756748. Maximum sequence length: 2049, sample length: 3272 [default0]:Skipping sample id=2713330. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2711966. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2498802. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2479180. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2729487. Maximum sequence length: 2049, sample length: 3877 [default0]:Skipping sample id=2725013. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2723829. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2738372. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2718572. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2750938. Maximum sequence length: 2049, sample length: 3125 [default0]:Skipping sample id=2718280. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2494714. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2715659. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2734116. Maximum sequence length: 2049, sample length: 4751 [default0]:Skipping sample id=2732081. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2732394. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2746045. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2720038. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2747282. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2713098. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2729518. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2735943. Maximum sequence length: 2049, sample length: 4438 [default0]:Skipping sample id=2747126. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2743369. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2752973. Maximum sequence length: 2049, sample length: 5139 [default0]:Skipping sample id=2717314. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2479440. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2727179. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2733428. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2716868. Maximum sequence length: 2049, sample length: 3430 [default0]:Skipping sample id=2487778. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2735661. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2714784. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2737558. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2477463. Maximum sequence length: 2049, sample length: 3174 [default0]:Skipping sample id=2731165. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2756875. Maximum sequence length: 2049, sample length: 2568 [default0]:Skipping sample id=2494627. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2753920. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2754108. Maximum sequence length: 2049, sample length: 4814 [default0]:Skipping sample id=2716704. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2748191. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2729378. Maximum sequence length: 2049, sample length: 3422 [default0]:Skipping sample id=2485138. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2751809. Maximum sequence length: 2049, sample length: 3384 [default0]:Skipping sample id=2730202. Maximum sequence length: 2049, sample length: 3026 [default0]:Skipping sample id=2731134. Maximum sequence length: 2049, sample length: 3109 [default0]:Skipping sample id=2719783. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2753777. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2734440. Maximum sequence length: 2049, sample length: 3495 [default0]:Skipping sample id=2729131. Maximum sequence length: 2049, sample length: 3315 [default0]:Skipping sample id=2714124. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2713158. Maximum sequence length: 2049, sample length: 2985 [default0]:Skipping sample id=2740602. Maximum sequence length: 2049, sample length: 3703 [default0]:Skipping sample id=2482450. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2466282. Maximum sequence length: 2049, sample length: 3034 [default0]:Skipping sample id=2722692. Maximum sequence length: 2049, sample length: 3297 [default0]:Skipping sample id=2726107. Maximum sequence length: 2049, sample length: 2894 [default0]:Skipping sample id=2717605. Maximum sequence length: 2049, sample length: 5086 [default0]:Skipping sample id=2721377. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2712970. Maximum sequence length: 2049, sample length: 4938 [default0]:Skipping sample id=2719581. Maximum sequence length: 2049, sample length: 6638 [default0]:Skipping sample id=2721598. Maximum sequence length: 2049, sample length: 6668 [default0]:Skipping sample id=2748415. Maximum sequence length: 2049, sample length: 4279 [default0]:Skipping sample id=2739667. Maximum sequence length: 2049, sample length: 3453 [default0]:Skipping sample id=2757070. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2718937. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2738639. Maximum sequence length: 2049, sample length: 4586 [default0]:Skipping sample id=2749524. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2722907. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2734056. Maximum sequence length: 2049, sample length: 2982 [default0]:Skipping sample id=2752503. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2755157. Maximum sequence length: 2049, sample length: 4526 [default0]:Skipping sample id=2734934. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2716528. Maximum sequence length: 2049, sample length: 3286 [default0]:Skipping sample id=2727341. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2734390. Maximum sequence length: 2049, sample length: 4635 [default0]:Skipping sample id=2755346. Maximum sequence length: 2049, sample length: 2646 [default0]:Skipping sample id=2710964. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2496432. Maximum sequence length: 2049, sample length: 3014 [default0]:Skipping sample id=2721419. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2713475. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2746859. Maximum sequence length: 2049, sample length: 3100 [default0]:Skipping sample id=2478181. Maximum sequence length: 2049, sample length: 3387 [default0]:Skipping sample id=2755446. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2741243. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2737719. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2712842. Maximum sequence length: 2049, sample length: 4910 [default0]:Skipping sample id=2730040. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2745461. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2752796. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2716230. Maximum sequence length: 2049, sample length: 4862 [default0]:Skipping sample id=2734187. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2492086. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2752746. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2740685. Maximum sequence length: 2049, sample length: 3699 [default0]:Skipping sample id=2724882. Maximum sequence length: 2049, sample length: 4331 [default0]:Skipping sample id=2738081. Maximum sequence length: 2049, sample length: 4319 [default0]:Skipping sample id=2729351. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2753522. Maximum sequence length: 2049, sample length: 3357 [default0]:Skipping sample id=2734708. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2485595. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2465791. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2493158. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2714649. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2714438. Maximum sequence length: 2049, sample length: 3802 [default0]:Skipping sample id=2713959. Maximum sequence length: 2049, sample length: 3221 [default0]:Skipping sample id=2465889. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2754547. Maximum sequence length: 2049, sample length: 5108 [default0]:Skipping sample id=2744055. Maximum sequence length: 2049, sample length: 3739 [default0]:Skipping sample id=2741852. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2730174. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2490429. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2711999. Maximum sequence length: 2049, sample length: 3571 [default0]:Skipping sample id=2743574. Maximum sequence length: 2049, sample length: 3085 [default0]:Skipping sample id=2735052. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2732697. Maximum sequence length: 2049, sample length: 3204 [default0]:Skipping sample id=2726605. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2738693. Maximum sequence length: 2049, sample length: 3025 [default0]:Skipping sample id=2743048. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2714949. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2498380. Maximum sequence length: 2049, sample length: 2850 [default0]:Skipping sample id=2730011. Maximum sequence length: 2049, sample length: 4314 [default0]:Skipping sample id=2715078. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2711844. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2487699. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2728132. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2727025. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2729236. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2754604. Maximum sequence length: 2049, sample length: 3319 [default0]:Skipping sample id=2743820. Maximum sequence length: 2049, sample length: 3946 [default0]:Skipping sample id=2477844. Maximum sequence length: 2049, sample length: 3193 [default0]:Skipping sample id=2496899. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2713951. Maximum sequence length: 2049, sample length: 4589 [default0]:Skipping sample id=2727101. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2733741. Maximum sequence length: 2049, sample length: 4948 [default0]:Skipping sample id=2741283. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2746602. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2752610. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2755181. Maximum sequence length: 2049, sample length: 3707 [default0]:Skipping sample id=2736509. Maximum sequence length: 2049, sample length: 4278 [default0]:Skipping sample id=2712212. Maximum sequence length: 2049, sample length: 2878 [default0]:Skipping sample id=2719927. Maximum sequence length: 2049, sample length: 5235 [default0]:Skipping sample id=2724047. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2713061. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2743712. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2718858. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2747405. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2747099. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2735152. Maximum sequence length: 2049, sample length: 3801 [default0]:Skipping sample id=2711493. Maximum sequence length: 2049, sample length: 4278 [default0]:Skipping sample id=2729074. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2493436. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2736232. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2735719. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2737711. Maximum sequence length: 2049, sample length: 6256 [default0]:Skipping sample id=2478965. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2715993. Maximum sequence length: 2049, sample length: 4288 [default0]:Skipping sample id=2743061. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2737087. Maximum sequence length: 2049, sample length: 2935 [default0]:Skipping sample id=2733750. Maximum sequence length: 2049, sample length: 4855 [default0]:Skipping sample id=2718666. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2729515. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2748781. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2740582. Maximum sequence length: 2049, sample length: 3463 [default0]:Skipping sample id=2732689. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2718374. Maximum sequence length: 2049, sample length: 3534 [default0]:Skipping sample id=2722310. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2497858. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2723728. Maximum sequence length: 2049, sample length: 6158 [default0]:Skipping sample id=2744620. Maximum sequence length: 2049, sample length: 2938 [default0]:Skipping sample id=2756640. Maximum sequence length: 2049, sample length: 3363 [default0]:Skipping sample id=2720398. Maximum sequence length: 2049, sample length: 3844 [default0]:Skipping sample id=2754726. Maximum sequence length: 2049, sample length: 2915 [default0]:Skipping sample id=2715111. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2490751. Maximum sequence length: 2049, sample length: 3540 [default0]:Skipping sample id=2742972. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2724955. Maximum sequence length: 2049, sample length: 4577 [default0]:Skipping sample id=2750768. Maximum sequence length: 2049, sample length: 3688 [default0]:Skipping sample id=2739500. Maximum sequence length: 2049, sample length: 2818 [default0]:Skipping sample id=2725315. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2747395. Maximum sequence length: 2049, sample length: 3966 [default0]:Skipping sample id=2725516. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2720432. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2719502. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2479674. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2486816. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2711233. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2729136. Maximum sequence length: 2049, sample length: 3581 [default0]:Skipping sample id=2716110. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2712714. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2755250. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2723248. Maximum sequence length: 2049, sample length: 3045 [default0]:Skipping sample id=2726857. Maximum sequence length: 2049, sample length: 4228 [default0]:Skipping sample id=2733799. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2726148. Maximum sequence length: 2049, sample length: 3624 [default0]:Skipping sample id=2719031. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2737687. Maximum sequence length: 2049, sample length: 3689 [default0]:Skipping sample id=2729992. Maximum sequence length: 2049, sample length: 3074 [default0]:Skipping sample id=2751500. Maximum sequence length: 2049, sample length: 5167 [default0]:Skipping sample id=2714671. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2714011. Maximum sequence length: 2049, sample length: 4370 [default0]:Skipping sample id=2735409. Maximum sequence length: 2049, sample length: 6616 [default0]:Skipping sample id=2711416. Maximum sequence length: 2049, sample length: 3785 [default0]:Skipping sample id=2498883. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2488202. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2717606. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2711558. Maximum sequence length: 2049, sample length: 4718 [default0]:Skipping sample id=2718655. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2755305. Maximum sequence length: 2049, sample length: 3039 [default0]:Skipping sample id=2731749. Maximum sequence length: 2049, sample length: 2740 [default0]:Skipping sample id=2722677. Maximum sequence length: 2049, sample length: 3650 [default0]:Skipping sample id=2750926. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2720090. Maximum sequence length: 2049, sample length: 4832 [default0]:Skipping sample id=2724487. Maximum sequence length: 2049, sample length: 4020 [default0]:Skipping sample id=2716053. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2750132. Maximum sequence length: 2049, sample length: 3966 [default0]:Skipping sample id=2719998. Maximum sequence length: 2049, sample length: 3675 [default0]:Skipping sample id=2487242. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2484712. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2727149. Maximum sequence length: 2049, sample length: 2598 [default0]:Skipping sample id=2718752. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2724997. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2720408. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2716992. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2738834. Maximum sequence length: 2049, sample length: 6211 [default0]:Skipping sample id=2755742. Maximum sequence length: 2049, sample length: 5318 [default0]:Skipping sample id=2478180. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2731856. Maximum sequence length: 2049, sample length: 6650 [default0]:Skipping sample id=2498210. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2753601. Maximum sequence length: 2049, sample length: 3377 [default0]:Skipping sample id=2728853. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2726202. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2725189. Maximum sequence length: 2049, sample length: 3443 [default0]:Skipping sample id=2718955. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2735870. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2740988. Maximum sequence length: 2049, sample length: 3240 [default0]:Skipping sample id=2754264. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2756921. Maximum sequence length: 2049, sample length: 3753 [default0]:Skipping sample id=2735200. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2713459. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2733447. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2755496. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2725825. Maximum sequence length: 2049, sample length: 3053 [default0]:Skipping sample id=2717130. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2744762. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2720371. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2729605. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2724796. Maximum sequence length: 2049, sample length: 4782 [default0]:Skipping sample id=2477151. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2741858. Maximum sequence length: 2049, sample length: 3964 [default0]:Skipping sample id=2495915. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2754840. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2488359. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2740250. Maximum sequence length: 2049, sample length: 2938 [default0]:Skipping sample id=2746818. Maximum sequence length: 2049, sample length: 3338 [default0]:Skipping sample id=2748968. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2746158. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2717136. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2741679. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2492918. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2740846. Maximum sequence length: 2049, sample length: 3910 [default0]:Skipping sample id=2751459. Maximum sequence length: 2049, sample length: 3348 [default0]:Skipping sample id=2735222. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2723429. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2745067. Maximum sequence length: 2049, sample length: 2810 [default0]:Skipping sample id=2755458. Maximum sequence length: 2049, sample length: 3402 [default0]:Skipping sample id=2732905. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2729618. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2725592. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2736876. Maximum sequence length: 2049, sample length: 2594 [default0]:Skipping sample id=2727417. Maximum sequence length: 2049, sample length: 14247 [default0]:Skipping sample id=2480444. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2725856. Maximum sequence length: 2049, sample length: 2625 [default0]:Skipping sample id=2719838. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2741686. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2737111. Maximum sequence length: 2049, sample length: 5522 [default0]:Skipping sample id=2721659. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2754467. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2727222. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2731121. Maximum sequence length: 2049, sample length: 3951 [default0]:Skipping sample id=2725123. Maximum sequence length: 2049, sample length: 3887 [default0]:Skipping sample id=2752871. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2723961. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2754193. Maximum sequence length: 2049, sample length: 3551 [default0]:Skipping sample id=2756427. Maximum sequence length: 2049, sample length: 5343 [default0]:Skipping sample id=2729510. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2738042. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2729468. Maximum sequence length: 2049, sample length: 3249 [default0]:Skipping sample id=2739615. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2735732. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2487914. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2471004. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2723400. Maximum sequence length: 2049, sample length: 3397 [default0]:Skipping sample id=2726109. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2723409. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2738000. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2716075. Maximum sequence length: 2049, sample length: 3158 [default0]:Skipping sample id=2748873. Maximum sequence length: 2049, sample length: 4826 [default0]:Skipping sample id=2718802. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2727693. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2734430. Maximum sequence length: 2049, sample length: 2567 [default0]:Skipping sample id=2739316. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2739526. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2725598. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2726873. Maximum sequence length: 2049, sample length: 5177 [default0]:Skipping sample id=2478243. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2711450. Maximum sequence length: 2049, sample length: 3113 [default0]:Skipping sample id=2724889. Maximum sequence length: 2049, sample length: 4091 [default0]:Skipping sample id=2727671. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2741361. Maximum sequence length: 2049, sample length: 3266 [default0]:Skipping sample id=2752197. Maximum sequence length: 2049, sample length: 5709 [default0]:Skipping sample id=2717773. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2731498. Maximum sequence length: 2049, sample length: 3393 [default0]:Skipping sample id=2722279. Maximum sequence length: 2049, sample length: 3716 [default0]:Skipping sample id=2737692. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2488951. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2734185. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2748617. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2748206. Maximum sequence length: 2049, sample length: 6768 [default0]:Skipping sample id=2497310. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2745374. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2726253. Maximum sequence length: 2049, sample length: 3280 [default0]:Skipping sample id=2725218. Maximum sequence length: 2049, sample length: 3641 [default0]:Skipping sample id=2487279. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2723850. Maximum sequence length: 2049, sample length: 3265 [default0]:Skipping sample id=2494435. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2718535. Maximum sequence length: 2049, sample length: 2997 [default0]:Skipping sample id=2717960. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2739332. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2740682. Maximum sequence length: 2049, sample length: 2857 [default0]:Skipping sample id=2731013. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2755011. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2730790. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2719755. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2723143. Maximum sequence length: 2049, sample length: 2624 [default0]:Skipping sample id=2719427. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2730036. Maximum sequence length: 2049, sample length: 2468 [default0]:Skipping sample id=2723063. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2727754. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2715069. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2749487. Maximum sequence length: 2049, sample length: 3342 [default0]:Skipping sample id=2755674. Maximum sequence length: 2049, sample length: 3630 [default0]:Skipping sample id=2753606. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2740630. Maximum sequence length: 2049, sample length: 3272 [default0]:Skipping sample id=2748880. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2723373. Maximum sequence length: 2049, sample length: 3767 [default0]:Skipping sample id=2733729. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2739481. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2716508. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2722311. Maximum sequence length: 2049, sample length: 3231 [default0]:Skipping sample id=2740960. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2747144. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2489092. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2470341. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2720775. Maximum sequence length: 2049, sample length: 2232 [default0]:Skipping sample id=2493783. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2712083. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2726889. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2493513. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2751291. Maximum sequence length: 2049, sample length: 6405 [default0]:Skipping sample id=2754562. Maximum sequence length: 2049, sample length: 2837 [default0]:Skipping sample id=2744912. Maximum sequence length: 2049, sample length: 2580 [default0]:Skipping sample id=2734181. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2718489. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2752531. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2736968. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2730741. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2756703. Maximum sequence length: 2049, sample length: 3230 [default0]:Skipping sample id=2743925. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2743460. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2736698. Maximum sequence length: 2049, sample length: 3566 [default0]:Skipping sample id=2718057. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2732698. Maximum sequence length: 2049, sample length: 4080 [default0]:Skipping sample id=2725500. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2723805. Maximum sequence length: 2049, sample length: 3369 [default0]:Skipping sample id=2749526. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2724005. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2749158. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2490842. Maximum sequence length: 2049, sample length: 2625 [default0]:Skipping sample id=2712938. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2722308. Maximum sequence length: 2049, sample length: 2933 [default0]:Skipping sample id=2729637. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2482967. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2719554. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2735005. Maximum sequence length: 2049, sample length: 7607 [default0]:Skipping sample id=2748951. Maximum sequence length: 2049, sample length: 3034 [default0]:Skipping sample id=2491059. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2751344. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2715543. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2747937. Maximum sequence length: 2049, sample length: 7074 [default0]:Skipping sample id=2731024. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2724964. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2747185. Maximum sequence length: 2049, sample length: 2574 [default0]:Skipping sample id=2717417. Maximum sequence length: 2049, sample length: 3034 [default0]:Skipping sample id=2755906. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2734715. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2485446. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2744633. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2735846. Maximum sequence length: 2049, sample length: 2575 [default0]:Skipping sample id=2745194. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2715427. Maximum sequence length: 2049, sample length: 4448 [default0]:Skipping sample id=2466673. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2712878. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2734735. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2726757. Maximum sequence length: 2049, sample length: 3193 [default0]:Skipping sample id=2486895. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2730590. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2739770. Maximum sequence length: 2049, sample length: 4520 [default0]:Skipping sample id=2724708. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2749701. Maximum sequence length: 2049, sample length: 2997 [default0]:Skipping sample id=2713864. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2730650. Maximum sequence length: 2049, sample length: 3629 [default0]:Skipping sample id=2750493. Maximum sequence length: 2049, sample length: 3366 [default0]:Skipping sample id=2728937. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2493665. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2749652. Maximum sequence length: 2049, sample length: 3388 [default0]:Skipping sample id=2750155. Maximum sequence length: 2049, sample length: 4433 [default0]:Skipping sample id=2493321. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2747029. Maximum sequence length: 2049, sample length: 5843 [default0]:Skipping sample id=2734758. Maximum sequence length: 2049, sample length: 2959 [default0]:Skipping sample id=2466987. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2748308. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2727120. Maximum sequence length: 2049, sample length: 4500 [default0]:Skipping sample id=2712325. Maximum sequence length: 2049, sample length: 3801 [default0]:Skipping sample id=2726590. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2739982. Maximum sequence length: 2049, sample length: 5406 [default0]:Skipping sample id=2742757. Maximum sequence length: 2049, sample length: 2565 [default0]:Skipping sample id=2720135. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2741698. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2751246. Maximum sequence length: 2049, sample length: 3248 [default0]:Skipping sample id=2749585. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2720016. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2745480. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2734324. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2748428. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2499165. Maximum sequence length: 2049, sample length: 3093 [default0]:Skipping sample id=2499092. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2741729. Maximum sequence length: 2049, sample length: 2875 [default0]:Skipping sample id=2721966. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2751327. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2751832. Maximum sequence length: 2049, sample length: 4871 [default0]:Skipping sample id=2494203. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2466022. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2720347. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2745739. Maximum sequence length: 2049, sample length: 2839 [default0]:Skipping sample id=2740365. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2715674. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2713498. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2726681. Maximum sequence length: 2049, sample length: 2586 [default0]:Skipping sample id=2755111. Maximum sequence length: 2049, sample length: 3412 [default0]:Skipping sample id=2499240. Maximum sequence length: 2049, sample length: 2780 [default0]:Skipping sample id=2716000. Maximum sequence length: 2049, sample length: 3986 [default0]:Skipping sample id=2740070. Maximum sequence length: 2049, sample length: 2761 [default0]:Skipping sample id=2723570. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2479416. Maximum sequence length: 2049, sample length: 2737 [default0]:Skipping sample id=2726237. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2465739. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2753030. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2745949. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2489074. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2751446. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2731756. Maximum sequence length: 2049, sample length: 2955 [default0]:Skipping sample id=2713732. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2730530. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2749943. Maximum sequence length: 2049, sample length: 3606 [default0]:Skipping sample id=2713342. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2728081. Maximum sequence length: 2049, sample length: 4242 [default0]:Skipping sample id=2716006. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2712265. Maximum sequence length: 2049, sample length: 3582 [default0]:Skipping sample id=2724162. Maximum sequence length: 2049, sample length: 3048 [default0]:Skipping sample id=2477559. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2753897. Maximum sequence length: 2049, sample length: 3000 [default0]:Skipping sample id=2731805. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2727219. Maximum sequence length: 2049, sample length: 5634 [default0]:Skipping sample id=2483685. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2727093. Maximum sequence length: 2049, sample length: 3355 [default0]:Skipping sample id=2730431. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2724757. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2723319. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2725318. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2737757. Maximum sequence length: 2049, sample length: 4899 [default0]:Skipping sample id=2727329. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2741253. Maximum sequence length: 2049, sample length: 3357 [default0]:Skipping sample id=2733401. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2733933. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2728447. Maximum sequence length: 2049, sample length: 3822 [default0]:Skipping sample id=2731474. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2733829. Maximum sequence length: 2049, sample length: 3722 [default0]:Skipping sample id=2493950. Maximum sequence length: 2049, sample length: 3592 [default0]:Skipping sample id=2737080. Maximum sequence length: 2049, sample length: 3673 [default0]:Skipping sample id=2721544. Maximum sequence length: 2049, sample length: 4521 [default0]:Skipping sample id=2747827. Maximum sequence length: 2049, sample length: 3190 [default0]:Skipping sample id=2745135. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2730757. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2728506. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2730318. Maximum sequence length: 2049, sample length: 3426 [default0]:Skipping sample id=2725313. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2757073. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2745797. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2730739. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2743402. Maximum sequence length: 2049, sample length: 5153 [default0]:Skipping sample id=2739057. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2713931. Maximum sequence length: 2049, sample length: 3518 [default0]:Skipping sample id=2468679. Maximum sequence length: 2049, sample length: 2945 [default0]:Skipping sample id=2750197. Maximum sequence length: 2049, sample length: 4247 [default0]:Skipping sample id=2730799. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2744274. Maximum sequence length: 2049, sample length: 4124 [default0]:Skipping sample id=2723618. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2730638. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2483152. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2729972. Maximum sequence length: 2049, sample length: 2989 [default0]:Skipping sample id=2750163. Maximum sequence length: 2049, sample length: 3415 [default0]:Skipping sample id=2736648. Maximum sequence length: 2049, sample length: 3550 [default0]:Skipping sample id=2729769. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2726437. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2716332. Maximum sequence length: 2049, sample length: 3509 [default0]:Skipping sample id=2718040. Maximum sequence length: 2049, sample length: 5821 [default0]:Skipping sample id=2717973. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2741819. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2711859. Maximum sequence length: 2049, sample length: 2997 [default0]:Skipping sample id=2734615. Maximum sequence length: 2049, sample length: 3020 [default0]:Skipping sample id=2491345. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2713494. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2470271. Maximum sequence length: 2049, sample length: 2852 [default0]:Skipping sample id=2730219. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2743488. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2754195. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2748451. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2738367. Maximum sequence length: 2049, sample length: 3239 [default0]:Skipping sample id=2745251. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2726324. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2747184. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2715365. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2743987. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2737213. Maximum sequence length: 2049, sample length: 3594 [default0]:Skipping sample id=2735162. Maximum sequence length: 2049, sample length: 4012 [default0]:Skipping sample id=2492625. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2488629. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2716946. Maximum sequence length: 2049, sample length: 3909 [default0]:Skipping sample id=2743437. Maximum sequence length: 2049, sample length: 3697 [default0]:Skipping sample id=2720875. Maximum sequence length: 2049, sample length: 3228 [default0]:Skipping sample id=2730333. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2732589. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2715350. Maximum sequence length: 2049, sample length: 4532 [default0]:Skipping sample id=2754796. Maximum sequence length: 2049, sample length: 5615 [default0]:Skipping sample id=2746653. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2725072. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2715639. Maximum sequence length: 2049, sample length: 3000 [default0]:Skipping sample id=2714263. Maximum sequence length: 2049, sample length: 6639 [default0]:Skipping sample id=2712198. Maximum sequence length: 2049, sample length: 2934 [default0]:Skipping sample id=2736431. Maximum sequence length: 2049, sample length: 4063 [default0]:Skipping sample id=2711384. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2745270. Maximum sequence length: 2049, sample length: 3268 [default0]:Skipping sample id=2725801. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2715216. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2716410. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2740184. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2491958. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2716746. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2718518. Maximum sequence length: 2049, sample length: 3508 [default0]:Skipping sample id=2720250. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2715017. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2728457. Maximum sequence length: 2049, sample length: 3354 [default0]:Skipping sample id=2749104. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2742521. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2746448. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2755493. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2747568. Maximum sequence length: 2049, sample length: 2771 [default0]:Skipping sample id=2736885. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2752864. Maximum sequence length: 2049, sample length: 5463 [default0]:Skipping sample id=2711387. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2745249. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2726257. Maximum sequence length: 2049, sample length: 3650 [default0]:Skipping sample id=2717416. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2470641. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2735761. Maximum sequence length: 2049, sample length: 3798 [default0]:Skipping sample id=2735524. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2734291. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2736600. Maximum sequence length: 2049, sample length: 2939 [default0]:Skipping sample id=2726162. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2756644. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2750537. Maximum sequence length: 2049, sample length: 2836 [default0]:Skipping sample id=2712512. Maximum sequence length: 2049, sample length: 4386 [default0]:Skipping sample id=2727771. Maximum sequence length: 2049, sample length: 3755 [default0]:Skipping sample id=2748392. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2748625. Maximum sequence length: 2049, sample length: 3248 [default0]:Skipping sample id=2746672. Maximum sequence length: 2049, sample length: 4416 [default0]:Skipping sample id=2734494. Maximum sequence length: 2049, sample length: 5413 [default0]:Skipping sample id=2746177. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2755263. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2719833. Maximum sequence length: 2049, sample length: 3956 [default0]:Skipping sample id=2494117. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2495523. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2747396. Maximum sequence length: 2049, sample length: 4133 [default0]:Skipping sample id=2733928. Maximum sequence length: 2049, sample length: 3016 [default0]:Skipping sample id=2741499. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2743347. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2722502. Maximum sequence length: 2049, sample length: 3005 [default0]:Skipping sample id=2716956. Maximum sequence length: 2049, sample length: 3485 [default0]:Skipping sample id=2734966. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2466529. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2730748. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2730804. Maximum sequence length: 2049, sample length: 4796 [default0]:Skipping sample id=2712287. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2725129. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2756617. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2745522. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2746736. Maximum sequence length: 2049, sample length: 3157 [default0]:Skipping sample id=2720724. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2477176. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2752580. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2718997. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2467354. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2743629. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2739586. Maximum sequence length: 2049, sample length: 3522 [default0]:Skipping sample id=2715419. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2753434. Maximum sequence length: 2049, sample length: 2870 [default0]:Skipping sample id=2468686. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2717176. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2721108. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2728531. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2753326. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2490660. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2732268. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2722372. Maximum sequence length: 2049, sample length: 5473 [default0]:Skipping sample id=2478597. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2724381. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2748747. Maximum sequence length: 2049, sample length: 2889 [default0]:Skipping sample id=2736717. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2727823. Maximum sequence length: 2049, sample length: 4538 [default0]:Skipping sample id=2755113. Maximum sequence length: 2049, sample length: 3145 [default0]:Skipping sample id=2723680. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2719734. Maximum sequence length: 2049, sample length: 3084 [default0]:Skipping sample id=2467583. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2742224. Maximum sequence length: 2049, sample length: 3019 [default0]:Skipping sample id=2714128. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2720304. Maximum sequence length: 2049, sample length: 3060 [default0]:Skipping sample id=2713217. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2739164. Maximum sequence length: 2049, sample length: 3923 [default0]:Skipping sample id=2746660. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2488400. Maximum sequence length: 2049, sample length: 2616 [default0]:Skipping sample id=2739548. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2488568. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2726592. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2721374. Maximum sequence length: 2049, sample length: 3994 [default0]:Skipping sample id=2727399. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2734993. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2741181. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2749066. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2751436. Maximum sequence length: 2049, sample length: 4874 [default0]:Skipping sample id=2733160. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2483357. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2754632. Maximum sequence length: 2049, sample length: 4552 [default0]:Skipping sample id=2721585. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2736709. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2752527. Maximum sequence length: 2049, sample length: 3910 [default0]:Skipping sample id=2714855. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2753122. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2731895. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2751387. Maximum sequence length: 2049, sample length: 5272 [default0]:Skipping sample id=2750718. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2724959. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2730093. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2744696. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2738214. Maximum sequence length: 2049, sample length: 3372 [default0]:Skipping sample id=2729709. Maximum sequence length: 2049, sample length: 3067 [default0]:Skipping sample id=2484589. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2730679. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2752468. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2752080. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2716169. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2724308. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2741563. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2714793. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2740320. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2746119. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2719559. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2751179. Maximum sequence length: 2049, sample length: 2962 [default0]:Skipping sample id=2714565. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2745268. Maximum sequence length: 2049, sample length: 4040 [default0]:Skipping sample id=2730058. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2724863. Maximum sequence length: 2049, sample length: 3091 [default0]:Skipping sample id=2489578. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2493128. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2714683. Maximum sequence length: 2049, sample length: 4209 [default0]:Skipping sample id=2737065. Maximum sequence length: 2049, sample length: 3308 [default0]:Skipping sample id=2730436. Maximum sequence length: 2049, sample length: 3825 [default0]:Skipping sample id=2737732. Maximum sequence length: 2049, sample length: 4216 [default0]:Skipping sample id=2750015. Maximum sequence length: 2049, sample length: 2120 [default0]:Skipping sample id=2728588. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2756324. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2715697. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2734224. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2729790. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2726151. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2493308. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2711022. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2714195. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2749292. Maximum sequence length: 2049, sample length: 3386 [default0]:Skipping sample id=2747823. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2731031. Maximum sequence length: 2049, sample length: 3305 [default0]:Skipping sample id=2711557. Maximum sequence length: 2049, sample length: 4112 [default0]:Skipping sample id=2737837. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2732245. Maximum sequence length: 2049, sample length: 3497 [default0]:Skipping sample id=2493310. Maximum sequence length: 2049, sample length: 2708 [default0]:Skipping sample id=2715594. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2485341. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2755919. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2726517. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2712911. Maximum sequence length: 2049, sample length: 5821 [default0]:Skipping sample id=2736585. Maximum sequence length: 2049, sample length: 5047 [default0]:Skipping sample id=2749493. Maximum sequence length: 2049, sample length: 3763 [default0]:Skipping sample id=2746467. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2484233. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2721784. Maximum sequence length: 2049, sample length: 4690 [default0]:Skipping sample id=2713668. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2729418. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2491706. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2712905. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2751614. Maximum sequence length: 2049, sample length: 3944 [default0]:Skipping sample id=2752460. Maximum sequence length: 2049, sample length: 3099 [default0]:Skipping sample id=2466121. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2749092. Maximum sequence length: 2049, sample length: 6218 [default0]:Skipping sample id=2731690. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2714399. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2721921. Maximum sequence length: 2049, sample length: 4091 [default0]:Skipping sample id=2726137. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2747732. Maximum sequence length: 2049, sample length: 2946 [default0]:Skipping sample id=2747926. Maximum sequence length: 2049, sample length: 3084 [default0]:Skipping sample id=2470530. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2715893. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2716384. Maximum sequence length: 2049, sample length: 5086 [default0]:Skipping sample id=2756419. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2712311. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2729193. Maximum sequence length: 2049, sample length: 3160 [default0]:Skipping sample id=2733678. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2745783. Maximum sequence length: 2049, sample length: 3076 [default0]:Skipping sample id=2729083. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2754416. Maximum sequence length: 2049, sample length: 4407 [default0]:Skipping sample id=2746916. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2752755. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2496972. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2748054. Maximum sequence length: 2049, sample length: 4959 [default0]:Skipping sample id=2726007. Maximum sequence length: 2049, sample length: 3403 [default0]:Skipping sample id=2718871. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2745558. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2754174. Maximum sequence length: 2049, sample length: 3298 [default0]:Skipping sample id=2497328. Maximum sequence length: 2049, sample length: 2457 [default0]:Skipping sample id=2739140. Maximum sequence length: 2049, sample length: 3541 [default0]:Skipping sample id=2741289. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2752193. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2718061. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2735487. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2722456. Maximum sequence length: 2049, sample length: 2941 [default0]:Skipping sample id=2717500. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2754355. Maximum sequence length: 2049, sample length: 6165 [default0]:Skipping sample id=2721416. Maximum sequence length: 2049, sample length: 4917 [default0]:Skipping sample id=2745092. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2737405. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2738349. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2484154. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2745712. Maximum sequence length: 2049, sample length: 2671 [default0]:Skipping sample id=2723123. Maximum sequence length: 2049, sample length: 3241 [default0]:Skipping sample id=2747663. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2750484. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2711636. Maximum sequence length: 2049, sample length: 3566 [default0]:Skipping sample id=2722599. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2718355. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2753927. Maximum sequence length: 2049, sample length: 2881 [default0]:Skipping sample id=2731670. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2494131. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2751757. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2719416. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2716411. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2726727. Maximum sequence length: 2049, sample length: 6651 [default0]:Skipping sample id=2717093. Maximum sequence length: 2049, sample length: 4160 [default0]:Skipping sample id=2737194. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2738807. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2752031. Maximum sequence length: 2049, sample length: 4265 [default0]:Skipping sample id=2493466. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2721349. Maximum sequence length: 2049, sample length: 3281 [default0]:Skipping sample id=2731480. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2741277. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2732409. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2481748. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2739736. Maximum sequence length: 2049, sample length: 5074 [default0]:Skipping sample id=2718029. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2726234. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2711050. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2728958. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2492951. Maximum sequence length: 2049, sample length: 3109 [default0]:Skipping sample id=2711120. Maximum sequence length: 2049, sample length: 2712 [default0]:Skipping sample id=2746553. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2755563. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2491841. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2743792. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2467696. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2718452. Maximum sequence length: 2049, sample length: 6328 [default0]:Skipping sample id=2713348. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2739144. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2751681. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2728055. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2719117. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2714636. Maximum sequence length: 2049, sample length: 3811 [default0]:Skipping sample id=2480728. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2742405. Maximum sequence length: 2049, sample length: 4129 [default0]:Skipping sample id=2466605. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2718123. Maximum sequence length: 2049, sample length: 4719 [default0]:Skipping sample id=2483394. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2721714. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2747374. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2730427. Maximum sequence length: 2049, sample length: 3824 [default0]:Skipping sample id=2470565. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2713301. Maximum sequence length: 2049, sample length: 2934 [default0]:Skipping sample id=2756308. Maximum sequence length: 2049, sample length: 2960 [default0]:Skipping sample id=2753596. Maximum sequence length: 2049, sample length: 3017 [default0]:Skipping sample id=2723621. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2495717. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2736492. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2483057. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2742705. Maximum sequence length: 2049, sample length: 3645 [default0]:Skipping sample id=2716557. Maximum sequence length: 2049, sample length: 3289 [default0]:Skipping sample id=2731305. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2755144. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2718794. Maximum sequence length: 2049, sample length: 3636 [default0]:Skipping sample id=2756802. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2743338. Maximum sequence length: 2049, sample length: 3011 [default0]:Skipping sample id=2721503. Maximum sequence length: 2049, sample length: 2902 [default0]:Skipping sample id=2467577. Maximum sequence length: 2049, sample length: 3516 [default0]:Skipping sample id=2734711. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2723416. Maximum sequence length: 2049, sample length: 2992 [default0]:Skipping sample id=2746455. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2470874. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2753082. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2738412. Maximum sequence length: 2049, sample length: 3959 [default0]:Skipping sample id=2719771. Maximum sequence length: 2049, sample length: 3856 [default0]:Skipping sample id=2715680. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2731315. Maximum sequence length: 2049, sample length: 4557 [default0]:Skipping sample id=2747481. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2741039. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2756986. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2748881. Maximum sequence length: 2049, sample length: 3810 [default0]:Skipping sample id=2738659. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2741186. Maximum sequence length: 2049, sample length: 3091 [default0]:Skipping sample id=2727162. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2724510. Maximum sequence length: 2049, sample length: 4544 [default0]:Skipping sample id=2714870. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2742988. Maximum sequence length: 2049, sample length: 3072 [default0]:Skipping sample id=2483382. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2717159. Maximum sequence length: 2049, sample length: 14257 [default0]:Skipping sample id=2750794. Maximum sequence length: 2049, sample length: 4580 [default0]:Skipping sample id=2466220. Maximum sequence length: 2049, sample length: 3178 [default0]:Skipping sample id=2720692. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2715149. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2713260. Maximum sequence length: 2049, sample length: 3582 [default0]:Skipping sample id=2738356. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2742366. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2739963. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2744887. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2735903. Maximum sequence length: 2049, sample length: 3438 [default0]:Skipping sample id=2741619. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2755916. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2712184. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2726465. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2723024. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2714982. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2750728. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2741894. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2725527. Maximum sequence length: 2049, sample length: 3851 [default0]:Skipping sample id=2729701. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2726427. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2729201. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2748818. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2466993. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2722646. Maximum sequence length: 2049, sample length: 4139 [default0]:Skipping sample id=2747246. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2714694. Maximum sequence length: 2049, sample length: 3303 [default0]:Skipping sample id=2749771. Maximum sequence length: 2049, sample length: 2948 [default0]:Skipping sample id=2756444. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2716653. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2724006. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2719220. Maximum sequence length: 2049, sample length: 5313 [default0]:Skipping sample id=2744454. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2732741. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2496282. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2469485. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2481936. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2753915. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2729681. Maximum sequence length: 2049, sample length: 4319 [default0]:Skipping sample id=2751821. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2465933. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2713704. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2489913. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2750677. Maximum sequence length: 2049, sample length: 3321 [default0]:Skipping sample id=2727850. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2730771. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2717956. Maximum sequence length: 2049, sample length: 2708 [default0]:Skipping sample id=2723853. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2724903. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2717405. Maximum sequence length: 2049, sample length: 2961 [default0]:Skipping sample id=2740941. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2738384. Maximum sequence length: 2049, sample length: 6003 [default0]:Skipping sample id=2488144. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2752690. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2739662. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2744686. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2498714. Maximum sequence length: 2049, sample length: 3037 [default0]:Skipping sample id=2490272. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2729737. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2738669. Maximum sequence length: 2049, sample length: 2589 [default0]:Skipping sample id=2728668. Maximum sequence length: 2049, sample length: 3903 [default0]:Skipping sample id=2742455. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2714845. Maximum sequence length: 2049, sample length: 3822 [default0]:Skipping sample id=2477324. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2735090. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2734583. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2716207. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2719158. Maximum sequence length: 2049, sample length: 3304 [default0]:Skipping sample id=2734351. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2491749. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2728192. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2718576. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2755063. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2736493. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2753183. Maximum sequence length: 2049, sample length: 4958 [default0]:Skipping sample id=2718979. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2711813. Maximum sequence length: 2049, sample length: 7283 [default0]:Skipping sample id=2746981. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2747143. Maximum sequence length: 2049, sample length: 2767 [default0]:Skipping sample id=2736582. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2732719. Maximum sequence length: 2049, sample length: 5933 [default0]:Skipping sample id=2741470. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2468152. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2746590. Maximum sequence length: 2049, sample length: 2946 [default0]:Skipping sample id=2745226. Maximum sequence length: 2049, sample length: 6417 [default0]:Skipping sample id=2713564. Maximum sequence length: 2049, sample length: 4201 [default0]:Skipping sample id=2735264. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2721864. Maximum sequence length: 2049, sample length: 3337 [default0]:Skipping sample id=2749817. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2756944. Maximum sequence length: 2049, sample length: 2920 [default0]:Skipping sample id=2735505. Maximum sequence length: 2049, sample length: 3737 [default0]:Skipping sample id=2734296. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2735988. Maximum sequence length: 2049, sample length: 4491 [default0]:Skipping sample id=2491518. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2720246. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2745815. Maximum sequence length: 2049, sample length: 5010 [default0]:Skipping sample id=2745683. Maximum sequence length: 2049, sample length: 3234 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default0]: > elasped time to build and save shuffle-idx and sample-idx mapping (seconds): 7.535532 [default0]: > loading doc-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation_valid_indexmap_26624ns_42s_decoder_packed_batch_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation_valid_indexmap_26624ns_42s_decoder_packed_shuffle_idx.npy [default0]: loaded indexed file in 0.017 seconds [default0]:> finished creating T0 datasets ... [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default4]:[000-051] 177.5835B / 177.5835B [default4]:[000-037] 177.5835B / 177.5835B [default4]:[000-039] 177.5835B / 177.5835B [default0]:[000-064] 177.5835B / 177.5835B [default0]:[000-012] 177.5835B / 177.5835B [default0]:[000-004] 177.5835B / 177.5835B [default0]:[000-060] 177.5835B / 177.5835B [default0]:[000-044] 177.5835B / 177.5835B [default4]:[000-021] 177.5835B / 177.5835B [default0]:[000-066] 177.5835B / 177.5835B [default0]:[000-024] 177.5835B / 177.5835B [default4]:[000-045] 177.5835B / 177.5835B [default4]:[000-013] 177.5835B / 177.5835B [default0]:[000-042] 177.5835B / 177.5835B [default4]:[000-025] 177.5835B / 177.5835B [default7]:time (ms) | model-and-optimizer-setup: 34776.50 | train/valid/test-data-iterators-setup: 18625.31 [default4]:[000-071] 258.9563B / 0.0000B [default0]:[000-034] 177.5835B / 177.5835B [default4]:[000-067] 177.5835B / 177.5835B [default0]:[000-010] 177.5835B / 177.5835B [default4]:[000-035] 177.5835B / 177.5835B [default0]:[000-054] 177.5835B / 177.5835B [default0]:[000-016] 177.5835B / 177.5835B [default4]:[000-029] 177.5835B / 177.5835B [default4]:[000-009] 177.5835B / 177.5835B [default4]:[000-017] 177.5835B / 177.5835B [default4]:[000-055] 177.5835B / 177.5835B [default4]:[000-015] 177.5835B / 177.5835B [default4]:[000-033] 177.5835B / 177.5835B [default0]:[after dataloaders are built] datetime: 2022-09-03 19:41:24 [default0]:done with setup ... [default0]:training ... [default0]:Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: [default0]:[000-000] 258.9584B / 0.0021B [default4]:[000-043] 177.5835B / 177.5835B [default4]:[000-061] 177.5835B / 177.5835B [default4]:[000-005] 177.5835B / 177.5835B [default4]:[000-019] 177.5835B / 177.5835B [default0]:[000-018] 177.5835B / 177.5835B [default0]:[000-058] 177.5835B / 177.5835B [default0]:[000-038] 177.5835B / 177.5835B [default4]:[000-041] 177.5835B / 177.5835B [default4]:[000-069] 177.5835B / 177.5835B [default0]:[000-068] 177.5835B / 177.5835B [default4]:[000-003] 177.5835B / 177.5835B [default4]:[000-059] 177.5835B / 177.5835B [default4]:[000-001] 177.5835B / 177.5835B [default0]:[000-070] 177.5855B / 177.5855B [default0]:[000-006] 177.5835B / 177.5835B [default0]:[000-062] 177.5835B / 177.5835B [default4]:[000-027] 177.5835B / 177.5835B [default0]:[000-046] 177.5835B / 177.5835B [default4]:[000-047] 177.5835B / 177.5835B [default4]:[000-023] 177.5835B / 177.5835B [default0]:[000-008] 177.5835B / 177.5835B [default0]:[000-050] 177.5835B / 177.5835B [default4]:[000-057] 177.5835B / 177.5835B [default0]:[000-056] 177.5835B / 177.5835B [default0]:[000-032] 177.5835B / 177.5835B [default0]:[000-052] 177.5835B / 177.5835B [default4]:[000-053] 177.5835B / 177.5835B [default0]:[000-014] 177.5835B / 177.5835B [default0]:[000-030] 177.5835B / 177.5835B [default4]:[000-031] 177.5835B / 177.5835B [default0]:[000-028] 177.5835B / 177.5835B [default0]:[000-026] 177.5835B / 177.5835B [default4]:[000-011] 177.5835B / 177.5835B [default0]:[000-020] 177.5835B / 177.5835B [default0]:[000-022] 177.5835B / 177.5835B [default4]:[000-049] 177.5835B / 177.5835B [default4]:[000-063] 177.5835B / 177.5835B [default0]:[000-040] 177.5835B / 177.5835B [default0]:[000-036] 177.5835B / 177.5835B [default0]:[000-002] 177.5835B / 177.5835B [default0]:[000-048] 177.5835B / 177.5835B [default4]:[000-065] 177.5835B / 177.5835B [default4]:[000-007] 177.5835B / 177.5835B [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default0]:[before the start of training step] datetime: 2022-09-03 19:41:24 [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default1]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default0]:[Rank 24] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 40910.39501953125 | reserved: 46486.0 | max reserved: 46486.0 [default0]:[Rank 112] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 37038.39501953125 | reserved: 42006.0 | max reserved: 42006.0 [default0]:[Rank 120] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 36686.39501953125 | reserved: 42006.0 | max reserved: 42006.0 [default0]:[Rank 184] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 33870.39501953125 | reserved: 39318.0 | max reserved: 39318.0 [default4]:[Rank 188] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 33694.39501953125 | reserved: 40494.0 | max reserved: 40494.0 [default4]:[Rank 92] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 37918.39501953125 | reserved: 44078.0 | max reserved: 44078.0 [default4]:[Rank 108] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 37214.39501953125 | reserved: 44078.0 | max reserved: 44078.0 [default4]:[Rank 44] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 40030.39501953125 | reserved: 45590.0 | max reserved: 45590.0 [default4]:[Rank 228] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 31934.39501953125 | reserved: 37526.0 | max reserved: 37526.0 [default0]:[Rank 224] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 32110.39501953125 | reserved: 37526.0 | max reserved: 37526.0 [default0]:[Rank 32] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 40558.39501953125 | reserved: 45590.0 | max reserved: 45590.0 [default4]:[Rank 124] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 36510.39501953125 | reserved: 42006.0 | max reserved: 42006.0 [default0]:[Rank 248] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 31054.39501953125 | reserved: 36630.0 | max reserved: 36630.0 [default0]:[Rank 200] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 33166.39501953125 | reserved: 38422.0 | max reserved: 38422.0 [default0]:[Rank 128] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 36334.39501953125 | reserved: 43182.0 | max reserved: 43182.0 [default0]:[Rank 80] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 38446.39501953125 | reserved: 43798.0 | max reserved: 43798.0 [default4]:[Rank 212] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 32638.39501953125 | reserved: 38702.0 | max reserved: 38702.0 [default0]:[Rank 56] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 39502.39501953125 | reserved: 44694.0 | max reserved: 44694.0 [default0]:[Rank 104] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 37390.39501953125 | reserved: 44078.0 | max reserved: 44078.0 [default0]:[Rank 208] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 32814.39501953125 | reserved: 38422.0 | max reserved: 38422.0 [default4]:[Rank 148] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 35454.39501953125 | reserved: 42286.0 | max reserved: 42286.0 [default4]:[Rank 252] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 30878.39501953125 | reserved: 36910.0 | max reserved: 36910.0 [default4]:[Rank 84] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 38270.39501953125 | reserved: 43798.0 | max reserved: 43798.0 [default4]:[Rank 156] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 35102.39501953125 | reserved: 40326.0 | max reserved: 40326.0 [default4]:[Rank 204] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 32990.39501953125 | reserved: 38422.0 | max reserved: 38422.0 [default4]:[Rank 260] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 30526.39501953125 | reserved: 35734.0 | max reserved: 35734.0 [default0]:[Rank 48] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 39854.39501953125 | reserved: 45870.0 | max reserved: 45870.0 [default4]:[Rank 52] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 39678.39501953125 | reserved: 44694.0 | max reserved: 44694.0 [default0]:[Rank 192] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 33518.39501953125 | reserved: 39598.0 | max reserved: 39598.0 [default0]:[Rank 168] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 34574.39501953125 | reserved: 41390.0 | max reserved: 41390.0 [default4]:[Rank 28] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 40734.39501953125 | reserved: 46766.0 | max reserved: 46766.0 [default7]: iteration 1/ 3100 | consumed samples: 2048 | consumed tokens: 4194304 | elapsed time per iteration (s): 209.65 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 4.400736E+00 | grad norm: 60.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 9.769 | TFLOPs: 99.72 | [default0]:[Rank 160] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 34926.39501953125 | reserved: 40214.0 | max reserved: 40214.0 [default4]:[Rank 196] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 33342.39501953125 | reserved: 38422.0 | max reserved: 38422.0 [default0]:[Rank 88] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 38094.39501953125 | reserved: 44078.0 | max reserved: 44078.0 [default0]:[Rank 136] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 35982.39501953125 | reserved: 41222.0 | max reserved: 41222.0 [default0]:[Rank 256] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 30702.39501953125 | reserved: 36910.0 | max reserved: 36910.0 [default0]:[Rank 240] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 31406.39501953125 | reserved: 36630.0 | max reserved: 36630.0 [default0]:[Rank 40] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 40206.39501953125 | reserved: 45590.0 | max reserved: 45590.0 [default0]:[Rank 264] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 30350.39501953125 | reserved: 35734.0 | max reserved: 35734.0 [default4]:[Rank 140] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 35806.39501953125 | reserved: 41110.0 | max reserved: 41110.0 [default0]:[Rank 8] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 41614.39501953125 | reserved: 47662.0 | max reserved: 47662.0 [default0]:[Rank 176] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 34222.39501953125 | reserved: 39430.0 | max reserved: 39430.0 [default4]:[Rank 180] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 34046.39501953125 | reserved: 39318.0 | max reserved: 39318.0 [default4]:[Rank 100] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 37566.39501953125 | reserved: 42902.0 | max reserved: 42902.0 [default4]:[Rank 36] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 40382.39501953125 | reserved: 45590.0 | max reserved: 45590.0 [default4]:[Rank 268] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 30174.39501953125 | reserved: 35734.0 | max reserved: 35734.0 [default0]:[Rank 144] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 35630.39501953125 | reserved: 41110.0 | max reserved: 41110.0 [default0]:[Rank 16] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 41262.39501953125 | reserved: 46486.0 | max reserved: 46486.0 [default0]:[Rank 216] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 32462.39501953125 | reserved: 37526.0 | max reserved: 37526.0 [default0]:[Rank 96] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 37742.39501953125 | reserved: 42902.0 | max reserved: 42902.0 [default4]:[Rank 116] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 36862.39501953125 | reserved: 42006.0 | max reserved: 42006.0 [default4]:[Rank 68] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 38974.39501953125 | reserved: 44974.0 | max reserved: 44974.0 [default0]:[Rank 64] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 39150.39501953125 | reserved: 44694.0 | max reserved: 44694.0 [default4]:[Rank 284] (after 1 iterations) memory (MB) | allocated: 41930.33251953125 | max allocated: 55650.33203125 | reserved: 68848.0 | max reserved: 68848.0 [default0]:[Rank 72] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 38798.39501953125 | reserved: 43798.0 | max reserved: 43798.0 [default4]:[Rank 76] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 38622.39501953125 | reserved: 43798.0 | max reserved: 43798.0 [default4]:[Rank 172] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 34398.39501953125 | reserved: 40494.0 | max reserved: 40494.0 [default4]:[Rank 220] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 32286.39501953125 | reserved: 37526.0 | max reserved: 37526.0 [default0]:[Rank 0] (after 1 iterations) memory (MB) | allocated: 38080.58544921875 | max allocated: 62086.80322265625 | reserved: 76022.0 | max reserved: 76022.0 [default4]:[Rank 20] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 41086.39501953125 | reserved: 46486.0 | max reserved: 46486.0 [default0]:[Rank 272] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 29998.39501953125 | reserved: 36014.0 | max reserved: 36014.0 [default4]:[Rank 244] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 31230.39501953125 | reserved: 36630.0 | max reserved: 36630.0 [default4]:[Rank 164] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 34750.39501953125 | reserved: 40214.0 | max reserved: 40214.0 [default4]:[Rank 12] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 41438.39501953125 | reserved: 46486.0 | max reserved: 46486.0 [default4]:[Rank 132] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 36158.39501953125 | reserved: 41110.0 | max reserved: 41110.0 [default0]:[Rank 152] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 35278.39501953125 | reserved: 40214.0 | max reserved: 40214.0 [default4]:[Rank 60] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 39326.39501953125 | reserved: 44694.0 | max reserved: 44694.0 [default0]:[Rank 232] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 31758.39501953125 | reserved: 37806.0 | max reserved: 37806.0 [default4]:[Rank 276] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 29822.39501953125 | reserved: 34838.0 | max reserved: 34838.0 [default4]:[Rank 4] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 41790.39501953125 | reserved: 47382.0 | max reserved: 47382.0 [default4]:[Rank 236] (after 1 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 31582.39501953125 | reserved: 42118.0 | max reserved: 42118.0 [default0]:[Rank 280] (after 1 iterations) memory (MB) | allocated: 25990.69677734375 | max allocated: 29702.71142578125 | reserved: 34838.0 | max reserved: 34838.0 [default7]: iteration 2/ 3100 | consumed samples: 4096 | consumed tokens: 8388608 | elapsed time per iteration (s): 141.60 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 2.116342E+00 | grad norm: 29.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.463 | TFLOPs: 147.64 | [default7]: iteration 3/ 3100 | consumed samples: 6144 | consumed tokens: 12582912 | elapsed time per iteration (s): 141.64 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 3.848806E+00 | grad norm: 66.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.459 | TFLOPs: 147.61 | [default7]: iteration 4/ 3100 | consumed samples: 8192 | consumed tokens: 16777216 | elapsed time per iteration (s): 141.26 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 1.508053E+00 | grad norm: 8.785 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.498 | TFLOPs: 148.01 | [default0]:saving checkpoint at iteration 5 to /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]:[2022-09-03 19:54:20,024] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step5 is begin to save! [default4]:[2022-09-03 19:54:20,046] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,046] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,099] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt... [default7]: iteration 5/ 3100 | consumed samples: 10240 | consumed tokens: 20971520 | elapsed time per iteration (s): 141.61 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 1.499341E+00 | grad norm: 9.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.462 | TFLOPs: 147.64 | [default0]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,113] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,113] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,105] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,113] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,113] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,112] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,099] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_71_model_states.pt... [default4]:[2022-09-03 19:54:20,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_71_model_states.pt. [default0]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,112] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,112] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,113] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,113] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,100] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,100] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,113] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,113] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,100] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt... [default0]:[2022-09-03 19:54:20,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt... [default4]:[2022-09-03 19:54:20,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt... [default0]:[2022-09-03 19:54:23,271] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,272] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_22_model_states.pt... [default0]:[2022-09-03 19:54:23,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_22_model_states.pt. [default0]:[2022-09-03 19:54:23,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_12_model_states.pt... [default0]:[2022-09-03 19:54:23,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_12_model_states.pt. [default4]:[2022-09-03 19:54:23,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,334] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_65_model_states.pt... [default4]:[2022-09-03 19:54:23,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_65_model_states.pt. [default0]:[2022-09-03 19:54:23,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,362] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_40_model_states.pt... [default0]:[2022-09-03 19:54:23,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_40_model_states.pt. [default4]:[2022-09-03 19:54:23,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,396] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_05_model_states.pt... [default4]:[2022-09-03 19:54:23,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_05_model_states.pt. [default0]:[2022-09-03 19:54:23,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,467] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_20_model_states.pt... [default0]:[2022-09-03 19:54:23,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_20_model_states.pt. [default4]:[2022-09-03 19:54:23,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_13_model_states.pt... [default4]:[2022-09-03 19:54:23,460] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_13_model_states.pt. [default4]:[2022-09-03 19:54:23,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,474] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_23_model_states.pt... [default4]:[2022-09-03 19:54:23,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_23_model_states.pt. [default0]:[2022-09-03 19:54:23,459] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_14_model_states.pt... [default0]:[2022-09-03 19:54:23,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_14_model_states.pt. [default4]:[2022-09-03 19:54:23,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_07_model_states.pt... [default4]:[2022-09-03 19:54:23,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_07_model_states.pt. [default4]:[2022-09-03 19:54:23,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,486] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_67_model_states.pt... [default4]:[2022-09-03 19:54:23,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_67_model_states.pt. [default4]:[2022-09-03 19:54:23,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,525] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_45_model_states.pt... [default4]:[2022-09-03 19:54:23,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_45_model_states.pt. [default0]:[2022-09-03 19:54:23,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_04_model_states.pt... [default0]:[2022-09-03 19:54:23,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_04_model_states.pt. [default0]:[2022-09-03 19:54:23,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,543] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_02_model_states.pt... [default4]:[2022-09-03 19:54:23,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_61_model_states.pt... [default4]:[2022-09-03 19:54:23,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_61_model_states.pt. [default0]:[2022-09-03 19:54:23,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_06_model_states.pt... [default0]:[2022-09-03 19:54:23,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_06_model_states.pt. [default4]:[2022-09-03 19:54:23,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,511] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_15_model_states.pt... [default4]:[2022-09-03 19:54:23,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_15_model_states.pt. [default4]:[2022-09-03 19:54:23,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_21_model_states.pt... [default4]:[2022-09-03 19:54:23,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_21_model_states.pt. [default4]:[2022-09-03 19:54:23,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,555] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_63_model_states.pt... [default4]:[2022-09-03 19:54:23,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_63_model_states.pt. [default0]:[2022-09-03 19:54:23,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,536] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_60_model_states.pt... [default0]:[2022-09-03 19:54:23,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_60_model_states.pt. [default0]:[2022-09-03 19:54:23,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_54_model_states.pt... [default0]:[2022-09-03 19:54:23,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_54_model_states.pt. [default0]:[2022-09-03 19:54:23,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_66_model_states.pt... [default0]:[2022-09-03 19:54:23,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_66_model_states.pt. [default0]:[2022-09-03 19:54:23,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,625] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_26_model_states.pt... [default4]:[2022-09-03 19:54:23,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_41_model_states.pt... [default4]:[2022-09-03 19:54:23,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_41_model_states.pt. [default4]:[2022-09-03 19:54:23,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,608] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_55_model_states.pt... [default4]:[2022-09-03 19:54:23,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_55_model_states.pt. [default4]:[2022-09-03 19:54:23,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,588] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_03_model_states.pt... [default4]:[2022-09-03 19:54:23,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_03_model_states.pt. [default0]:[2022-09-03 19:54:23,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_02_model_states.pt. [default0]:[2022-09-03 19:54:23,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,700] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_62_model_states.pt... [default0]:[2022-09-03 19:54:23,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_62_model_states.pt. [default0]:[2022-09-03 19:54:23,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_56_model_states.pt... [default0]:[2022-09-03 19:54:23,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_56_model_states.pt. [default4]:[2022-09-03 19:54:23,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,640] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_57_model_states.pt... [default4]:[2022-09-03 19:54:23,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_57_model_states.pt. [default4]:[2022-09-03 19:54:23,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,654] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_09_model_states.pt... [default4]:[2022-09-03 19:54:23,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_09_model_states.pt. [default0]:[2022-09-03 19:54:23,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,661] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_18_model_states.pt... [default0]:[2022-09-03 19:54:23,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_18_model_states.pt. [default0]:[2022-09-03 19:54:23,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,625] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_64_model_states.pt... [default0]:[2022-09-03 19:54:23,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_64_model_states.pt. [default0]:[2022-09-03 19:54:23,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_44_model_states.pt... [default0]:[2022-09-03 19:54:23,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_44_model_states.pt. [default0]:[2022-09-03 19:54:23,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_26_model_states.pt. [default0]:[2022-09-03 19:54:23,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,686] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_58_model_states.pt... [default0]:[2022-09-03 19:54:23,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_58_model_states.pt. [default4]:[2022-09-03 19:54:23,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_27_model_states.pt... [default4]:[2022-09-03 19:54:23,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_27_model_states.pt. [default0]:[2022-09-03 19:54:23,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,718] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_08_model_states.pt... [default0]:[2022-09-03 19:54:23,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_08_model_states.pt. [default4]:[2022-09-03 19:54:23,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,751] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_19_model_states.pt... [default4]:[2022-09-03 19:54:23,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_19_model_states.pt. [default4]:[2022-09-03 19:54:23,776] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,776] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_59_model_states.pt... [default4]:[2022-09-03 19:54:23,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_59_model_states.pt. [default4]:[2022-09-03 19:54:23,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_69_model_states.pt... [default4]:[2022-09-03 19:54:23,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_69_model_states.pt. [default4]:[2022-09-03 19:54:23,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,889] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_11_model_states.pt... [default4]:[2022-09-03 19:54:23,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_11_model_states.pt. [default0]:[2022-09-03 19:54:23,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_34_model_states.pt... [default0]:[2022-09-03 19:54:23,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_34_model_states.pt. [default0]:[2022-09-03 19:54:23,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,885] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_38_model_states.pt... [default0]:[2022-09-03 19:54:23,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_38_model_states.pt. [default4]:[2022-09-03 19:54:23,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_43_model_states.pt... [default4]:[2022-09-03 19:54:23,926] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_43_model_states.pt. [default4]:[2022-09-03 19:54:23,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt. [default4]:[2022-09-03 19:54:23,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_35_model_states.pt... [default4]:[2022-09-03 19:54:23,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_35_model_states.pt. [default0]:[2022-09-03 19:54:23,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,892] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_10_model_states.pt... [default0]:[2022-09-03 19:54:23,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_10_model_states.pt. [default0]:[2022-09-03 19:54:23,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,932] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_42_model_states.pt... [default0]:[2022-09-03 19:54:23,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_42_model_states.pt. [default4]:[2022-09-03 19:54:23,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt. [default4]:[2022-09-03 19:54:24,000] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_39_model_states.pt... [default4]:[2022-09-03 19:54:24,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_39_model_states.pt. [default0]:[2022-09-03 19:54:23,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt. [default0]:[2022-09-03 19:54:23,940] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_68_model_states.pt... [default0]:[2022-09-03 19:54:23,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_68_model_states.pt. [default0]:[2022-09-03 19:54:24,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt. [default0]:[2022-09-03 19:54:24,069] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_24_model_states.pt... [default0]:[2022-09-03 19:54:24,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_24_model_states.pt. [default4]:[2022-09-03 19:54:24,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt. [default4]:[2022-09-03 19:54:24,052] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_25_model_states.pt... [default4]:[2022-09-03 19:54:24,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_25_model_states.pt. [default4]:[2022-09-03 19:54:24,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt. [default4]:[2022-09-03 19:54:24,193] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_37_model_states.pt... [default4]:[2022-09-03 19:54:24,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt. [default4]:[2022-09-03 19:54:24,177] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_33_model_states.pt... [default4]:[2022-09-03 19:54:24,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_33_model_states.pt. [default4]:[2022-09-03 19:54:24,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_37_model_states.pt. [default4]:[2022-09-03 19:54:24,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt. [default4]:[2022-09-03 19:54:24,246] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_29_model_states.pt... [default4]:[2022-09-03 19:54:24,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_29_model_states.pt. [default0]:[2022-09-03 19:54:24,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt. [default0]:[2022-09-03 19:54:24,275] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_52_model_states.pt... [default0]:[2022-09-03 19:54:24,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_52_model_states.pt. [default0]:[2022-09-03 19:54:24,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt. [default0]:[2022-09-03 19:54:24,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_36_model_states.pt... [default0]:[2022-09-03 19:54:24,301] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_36_model_states.pt. [default0]:[2022-09-03 19:54:24,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt. [default0]:[2022-09-03 19:54:24,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_30_model_states.pt... [default0]:[2022-09-03 19:54:24,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_30_model_states.pt. [default0]:[2022-09-03 19:54:24,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt. [default0]:[2022-09-03 19:54:24,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_28_model_states.pt... [default0]:[2022-09-03 19:54:24,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_28_model_states.pt. [default0]:[2022-09-03 19:54:24,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt. [default0]:[2022-09-03 19:54:24,360] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_32_model_states.pt... [default0]:[2022-09-03 19:54:24,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_32_model_states.pt. [default4]:[2022-09-03 19:54:24,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt. [default4]:[2022-09-03 19:54:24,378] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_53_model_states.pt... [default4]:[2022-09-03 19:54:24,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_53_model_states.pt. [default4]:[2022-09-03 19:54:24,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt. [default4]:[2022-09-03 19:54:24,346] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_31_model_states.pt... [default4]:[2022-09-03 19:54:24,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_31_model_states.pt. [default0]:[2022-09-03 19:54:24,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt. [default0]:[2022-09-03 19:54:24,434] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_46_model_states.pt... [default0]:[2022-09-03 19:54:24,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_46_model_states.pt. [default4]:[2022-09-03 19:54:24,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt. [default4]:[2022-09-03 19:54:24,515] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_49_model_states.pt... [default4]:[2022-09-03 19:54:24,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_49_model_states.pt. [default0]:[2022-09-03 19:54:24,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt. [default0]:[2022-09-03 19:54:24,702] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_48_model_states.pt... [default0]:[2022-09-03 19:54:24,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_48_model_states.pt. [default0]:[2022-09-03 19:54:24,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt. [default0]:[2022-09-03 19:54:24,699] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt... [default0]:[2022-09-03 19:54:24,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt. [default0]:[2022-09-03 19:54:24,700] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_70_model_states.pt... [default0]:[2022-09-03 19:54:24,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_70_model_states.pt. [default4]:[2022-09-03 19:54:24,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt. [default4]:[2022-09-03 19:54:24,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_47_model_states.pt... [default4]:[2022-09-03 19:54:24,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_47_model_states.pt. [default4]:[2022-09-03 19:54:24,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt. [default4]:[2022-09-03 19:54:24,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_17_model_states.pt... [default4]:[2022-09-03 19:54:24,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_17_model_states.pt. [default4]:[2022-09-03 19:54:24,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt. [default4]:[2022-09-03 19:54:24,791] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_01_model_states.pt... [default4]:[2022-09-03 19:54:24,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_01_model_states.pt. [default0]:[2022-09-03 19:54:24,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt. [default0]:[2022-09-03 19:54:24,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_50_model_states.pt... [default0]:[2022-09-03 19:54:24,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_50_model_states.pt. [default4]:[2022-09-03 19:54:24,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt. [default4]:[2022-09-03 19:54:24,942] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_51_model_states.pt... [default4]:[2022-09-03 19:54:24,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_51_model_states.pt. [default0]:[2022-09-03 19:54:25,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt. [default0]:[2022-09-03 19:54:25,049] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_16_model_states.pt... [default0]:[2022-09-03 19:54:25,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_16_model_states.pt. [default0]:[2022-09-03 19:54:25,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt. [default0]:[2022-09-03 19:54:25,912] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt [default0]:[2022-09-03 19:54:25,912] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:54:25,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt... [default4]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt... [default0]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt... [default7]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt... [default1]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt... [default3]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt... [default2]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt... [default6]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt... [default5]:[2022-09-03 19:54:26,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt... [default2]:[2022-09-03 19:54:31,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt. [default2]:[2022-09-03 19:54:31,900] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt [default1]:[2022-09-03 19:54:31,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt. [default1]:[2022-09-03 19:54:31,930] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt [default0]:[2022-09-03 19:54:32,300] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt. [default0]:[2022-09-03 19:54:32,300] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt [default3]:[2022-09-03 19:54:32,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt. [default3]:[2022-09-03 19:54:32,453] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt [default4]:[2022-09-03 19:54:32,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt. [default4]:[2022-09-03 19:54:32,795] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt [default2]:[2022-09-03 19:54:33,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt. [default2]:[2022-09-03 19:54:33,018] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt [default6]:[2022-09-03 19:54:33,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt. [default6]:[2022-09-03 19:54:33,353] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt [default7]:[2022-09-03 19:54:33,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt. [default7]:[2022-09-03 19:54:33,502] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt [default1]:[2022-09-03 19:54:33,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt. [default1]:[2022-09-03 19:54:33,577] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt [default5]:[2022-09-03 19:54:33,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt. [default5]:[2022-09-03 19:54:33,584] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt [default4]:[2022-09-03 19:54:33,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt. [default4]:[2022-09-03 19:54:33,687] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt [default7]:[2022-09-03 19:54:33,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt. [default7]:[2022-09-03 19:54:33,654] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt [default5]:[2022-09-03 19:54:33,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt. [default5]:[2022-09-03 19:54:33,660] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt [default0]:[2022-09-03 19:54:33,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt. [default0]:[2022-09-03 19:54:33,768] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt [default0]:[2022-09-03 19:54:33,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt. [default0]:[2022-09-03 19:54:33,814] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt [default2]:[2022-09-03 19:54:33,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt. [default2]:[2022-09-03 19:54:33,841] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt [default2]:[2022-09-03 19:54:33,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt. [default2]:[2022-09-03 19:54:33,909] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt [default5]:[2022-09-03 19:54:34,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt. [default5]:[2022-09-03 19:54:34,059] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt [default7]:[2022-09-03 19:54:34,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt. [default7]:[2022-09-03 19:54:34,111] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt [default3]:[2022-09-03 19:54:34,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt. [default3]:[2022-09-03 19:54:34,199] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt [default7]:[2022-09-03 19:54:34,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt. [default7]:[2022-09-03 19:54:34,132] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt [default1]:[2022-09-03 19:54:34,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt. [default1]:[2022-09-03 19:54:34,122] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt [default1]:[2022-09-03 19:54:34,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt. [default1]:[2022-09-03 19:54:34,185] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt [default4]:[2022-09-03 19:54:34,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt. [default4]:[2022-09-03 19:54:34,198] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt [default6]:[2022-09-03 19:54:34,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt. [default6]:[2022-09-03 19:54:34,189] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt [default0]:[2022-09-03 19:54:34,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt. [default0]:[2022-09-03 19:54:34,295] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt [default2]:[2022-09-03 19:54:34,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt. [default2]:[2022-09-03 19:54:34,296] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt [default3]:[2022-09-03 19:54:34,300] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt. [default3]:[2022-09-03 19:54:34,300] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt [default5]:[2022-09-03 19:54:34,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt. [default5]:[2022-09-03 19:54:34,380] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt [default2]:[2022-09-03 19:54:34,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt. [default2]:[2022-09-03 19:54:34,383] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt [default5]:[2022-09-03 19:54:34,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt. [default5]:[2022-09-03 19:54:34,331] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt [default3]:[2022-09-03 19:54:34,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt. [default3]:[2022-09-03 19:54:34,367] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt [default3]:[2022-09-03 19:54:34,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt. [default3]:[2022-09-03 19:54:34,401] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt [default6]:[2022-09-03 19:54:34,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt. [default6]:[2022-09-03 19:54:34,521] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt [default6]:[2022-09-03 19:54:34,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt. [default6]:[2022-09-03 19:54:34,502] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt [default4]:[2022-09-03 19:54:34,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt. [default4]:[2022-09-03 19:54:34,519] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt [default2]:[2022-09-03 19:54:34,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt. [default2]:[2022-09-03 19:54:34,498] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt [default0]:[2022-09-03 19:54:34,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt. [default0]:[2022-09-03 19:54:34,585] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt [default3]:[2022-09-03 19:54:34,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt. [default3]:[2022-09-03 19:54:34,572] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt [default4]:[2022-09-03 19:54:34,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt. [default4]:[2022-09-03 19:54:34,626] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt [default6]:[2022-09-03 19:54:34,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt. [default6]:[2022-09-03 19:54:34,644] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt [default0]:[2022-09-03 19:54:34,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt. [default0]:[2022-09-03 19:54:34,693] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt [default2]:[2022-09-03 19:54:34,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt. [default2]:[2022-09-03 19:54:34,636] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt [default1]:[2022-09-03 19:54:34,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt. [default1]:[2022-09-03 19:54:34,728] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt [default4]:[2022-09-03 19:54:34,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt. [default4]:[2022-09-03 19:54:34,741] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt [default1]:[2022-09-03 19:54:34,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt. [default1]:[2022-09-03 19:54:34,740] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt [default0]:[2022-09-03 19:54:34,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt. [default0]:[2022-09-03 19:54:34,733] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt [default6]:[2022-09-03 19:54:34,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt. [default6]:[2022-09-03 19:54:34,842] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt [default5]:[2022-09-03 19:54:34,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt. [default5]:[2022-09-03 19:54:34,883] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt [default1]:[2022-09-03 19:54:34,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt. [default1]:[2022-09-03 19:54:34,834] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt [default7]:[2022-09-03 19:54:34,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt. [default7]:[2022-09-03 19:54:34,875] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt [default1]:[2022-09-03 19:54:34,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt. [default1]:[2022-09-03 19:54:34,901] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt [default5]:[2022-09-03 19:54:35,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt. [default5]:[2022-09-03 19:54:35,002] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt [default4]:[2022-09-03 19:54:34,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt. [default4]:[2022-09-03 19:54:34,972] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt [default6]:[2022-09-03 19:54:34,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt. [default6]:[2022-09-03 19:54:34,954] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt [default4]:[2022-09-03 19:54:35,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt. [default4]:[2022-09-03 19:54:35,063] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt [default5]:[2022-09-03 19:54:35,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt. [default5]:[2022-09-03 19:54:35,101] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt [default7]:[2022-09-03 19:54:35,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt. [default7]:[2022-09-03 19:54:35,114] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt [default4]:[2022-09-03 19:54:35,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt. [default4]:[2022-09-03 19:54:35,145] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt [default4]:[2022-09-03 19:54:35,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt. [default4]:[2022-09-03 19:54:35,117] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt [default3]:[2022-09-03 19:54:35,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt. [default3]:[2022-09-03 19:54:35,203] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt [default1]:[2022-09-03 19:54:35,271] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt. [default1]:[2022-09-03 19:54:35,271] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt [default1]:[2022-09-03 19:54:35,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt. [default1]:[2022-09-03 19:54:35,396] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt [default5]:[2022-09-03 19:54:35,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt. [default5]:[2022-09-03 19:54:35,410] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt [default0]:[2022-09-03 19:54:35,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt. [default0]:[2022-09-03 19:54:35,359] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt [default7]:[2022-09-03 19:54:35,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt. [default7]:[2022-09-03 19:54:35,412] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt [default3]:[2022-09-03 19:54:35,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt. [default3]:[2022-09-03 19:54:35,479] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt [default3]:[2022-09-03 19:54:35,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt. [default3]:[2022-09-03 19:54:35,445] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt [default2]:[2022-09-03 19:54:35,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt. [default2]:[2022-09-03 19:54:35,626] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt [default7]:[2022-09-03 19:54:35,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt. [default7]:[2022-09-03 19:54:35,619] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt [default7]:[2022-09-03 19:54:35,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt. [default7]:[2022-09-03 19:54:35,531] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt [default0]:[2022-09-03 19:54:35,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt. [default0]:[2022-09-03 19:54:35,596] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt [default6]:[2022-09-03 19:54:35,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt. [default6]:[2022-09-03 19:54:35,637] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt [default2]:[2022-09-03 19:54:35,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt. [default2]:[2022-09-03 19:54:35,792] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt [default5]:[2022-09-03 19:54:35,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt. [default5]:[2022-09-03 19:54:35,737] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt [default3]:[2022-09-03 19:54:35,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt. [default3]:[2022-09-03 19:54:35,769] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt [default0]:[2022-09-03 19:54:35,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt. [default0]:[2022-09-03 19:54:35,811] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt [default7]:[2022-09-03 19:54:35,815] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt. [default7]:[2022-09-03 19:54:35,815] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt [default6]:[2022-09-03 19:54:35,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt. [default6]:[2022-09-03 19:54:35,808] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt [default6]:[2022-09-03 19:54:35,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt. [default6]:[2022-09-03 19:54:35,817] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt [default6]:[2022-09-03 19:54:35,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt. [default6]:[2022-09-03 19:54:35,895] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt [default5]:[2022-09-03 19:54:36,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt. [default5]:[2022-09-03 19:54:36,175] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt [default4]:[2022-09-03 19:54:36,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt. [default4]:[2022-09-03 19:54:36,420] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt [default7]:[2022-09-03 19:54:36,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt. [default7]:[2022-09-03 19:54:36,470] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt [default7]:[2022-09-03 19:54:36,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt. [default7]:[2022-09-03 19:54:36,766] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt [default3]:[2022-09-03 19:54:36,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt. [default3]:[2022-09-03 19:54:36,916] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt [default3]:[2022-09-03 19:54:36,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt. [default3]:[2022-09-03 19:54:36,964] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt [default5]:[2022-09-03 19:54:37,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt. [default5]:[2022-09-03 19:54:37,127] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt [default3]:[2022-09-03 19:54:37,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt. [default3]:[2022-09-03 19:54:37,248] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt [default2]:[2022-09-03 19:54:37,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt. [default2]:[2022-09-03 19:54:37,422] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt [default4]:[2022-09-03 19:54:37,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt. [default4]:[2022-09-03 19:54:37,439] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt [default5]:[2022-09-03 19:54:37,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt. [default5]:[2022-09-03 19:54:37,451] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt [default0]:[2022-09-03 19:54:37,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt. [default0]:[2022-09-03 19:54:37,613] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt [default2]:[2022-09-03 19:54:37,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt. [default2]:[2022-09-03 19:54:37,579] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt [default0]:[2022-09-03 19:54:37,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt. [default0]:[2022-09-03 19:54:37,910] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt [default1]:[2022-09-03 19:54:37,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt. [default1]:[2022-09-03 19:54:38,000] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt [default2]:[2022-09-03 19:54:37,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt. [default2]:[2022-09-03 19:54:37,962] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt [default7]:[2022-09-03 19:54:37,986] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt. [default7]:[2022-09-03 19:54:37,986] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt [default4]:[2022-09-03 19:54:38,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt. [default4]:[2022-09-03 19:54:38,124] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt [default6]:[2022-09-03 19:54:38,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt. [default6]:[2022-09-03 19:54:38,157] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt [default0]:[2022-09-03 19:54:38,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt. [default0]:[2022-09-03 19:54:38,213] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt [default7]:[2022-09-03 19:54:38,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt. [default7]:[2022-09-03 19:54:38,197] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt [default1]:[2022-09-03 19:54:38,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt. [default1]:[2022-09-03 19:54:38,241] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt [default2]:[2022-09-03 19:54:38,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt. [default2]:[2022-09-03 19:54:38,444] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt [default5]:[2022-09-03 19:54:38,459] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt. [default5]:[2022-09-03 19:54:38,460] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt [default6]:[2022-09-03 19:54:38,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt. [default6]:[2022-09-03 19:54:38,467] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt [default1]:[2022-09-03 19:54:38,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt. [default1]:[2022-09-03 19:54:38,488] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt [default2]:[2022-09-03 19:54:38,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt. [default2]:[2022-09-03 19:54:38,499] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt [default4]:[2022-09-03 19:54:38,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt. [default4]:[2022-09-03 19:54:38,507] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt [default3]:[2022-09-03 19:54:38,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt. [default3]:[2022-09-03 19:54:38,602] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt [default0]:[2022-09-03 19:54:38,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt. [default0]:[2022-09-03 19:54:38,568] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt [default1]:[2022-09-03 19:54:38,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt. [default1]:[2022-09-03 19:54:38,600] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt [default1]:[2022-09-03 19:54:38,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt. [default1]:[2022-09-03 19:54:38,692] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt [default6]:[2022-09-03 19:54:38,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt. [default6]:[2022-09-03 19:54:38,717] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt [default7]:[2022-09-03 19:54:38,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt. [default7]:[2022-09-03 19:54:38,784] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt [default0]:[2022-09-03 19:54:38,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt. [default0]:[2022-09-03 19:54:38,866] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt [default6]:[2022-09-03 19:54:38,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt. [default6]:[2022-09-03 19:54:38,923] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt [default3]:[2022-09-03 19:54:38,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt. [default3]:[2022-09-03 19:54:38,882] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt [default0]:[2022-09-03 19:54:38,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt. [default0]:[2022-09-03 19:54:38,936] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt [default6]:[2022-09-03 19:54:38,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt. [default6]:[2022-09-03 19:54:38,955] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt [default2]:[2022-09-03 19:54:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. [default2]:[2022-09-03 19:54:39,041] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt [default3]:[2022-09-03 19:54:39,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. [default3]:[2022-09-03 19:54:39,054] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt [default7]:[2022-09-03 19:54:39,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt. [default7]:[2022-09-03 19:54:39,028] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt [default1]:[2022-09-03 19:54:39,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt. [default1]:[2022-09-03 19:54:39,093] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt [default5]:[2022-09-03 19:54:39,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt. [default5]:[2022-09-03 19:54:39,198] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt [default1]:[2022-09-03 19:54:39,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt. [default1]:[2022-09-03 19:54:39,245] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt [default0]:[2022-09-03 19:54:39,272] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt. [default0]:[2022-09-03 19:54:39,272] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt [default4]:[2022-09-03 19:54:39,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt. [default4]:[2022-09-03 19:54:39,521] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt [default6]:[2022-09-03 19:54:39,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt. [default6]:[2022-09-03 19:54:39,617] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt [default5]:[2022-09-03 19:54:39,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt. [default5]:[2022-09-03 19:54:39,735] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt [default7]:[2022-09-03 19:54:39,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt. [default7]:[2022-09-03 19:54:39,680] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt [default0]:[2022-09-03 19:54:39,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt. [default0]:[2022-09-03 19:54:39,700] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt [default2]:[2022-09-03 19:54:39,759] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt. [default2]:[2022-09-03 19:54:39,760] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt [default3]:[2022-09-03 19:54:39,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt. [default3]:[2022-09-03 19:54:39,751] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt [default3]:[2022-09-03 19:54:39,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt. [default3]:[2022-09-03 19:54:39,739] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt [default4]:[2022-09-03 19:54:39,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt. [default4]:[2022-09-03 19:54:39,834] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt [default7]:[2022-09-03 19:54:39,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt. [default7]:[2022-09-03 19:54:39,972] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt [default3]:[2022-09-03 19:54:40,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt. [default3]:[2022-09-03 19:54:40,048] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt [default5]:[2022-09-03 19:54:40,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt. [default5]:[2022-09-03 19:54:40,109] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt [default6]:[2022-09-03 19:54:40,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt. [default6]:[2022-09-03 19:54:40,227] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt [default2]:[2022-09-03 19:54:40,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt. [default2]:[2022-09-03 19:54:40,189] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt [default3]:[2022-09-03 19:54:40,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt. [default3]:[2022-09-03 19:54:40,261] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt [default4]:[2022-09-03 19:54:40,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt. [default4]:[2022-09-03 19:54:40,304] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt [default7]:[2022-09-03 19:54:40,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt. [default7]:[2022-09-03 19:54:40,430] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt [default4]:[2022-09-03 19:54:40,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt. [default4]:[2022-09-03 19:54:40,439] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt [default1]:[2022-09-03 19:54:40,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt. [default1]:[2022-09-03 19:54:40,516] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt [default2]:[2022-09-03 19:54:40,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt. [default2]:[2022-09-03 19:54:40,484] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt [default5]:[2022-09-03 19:54:40,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt. [default5]:[2022-09-03 19:54:40,484] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt [default5]:[2022-09-03 19:54:40,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt. [default5]:[2022-09-03 19:54:40,558] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt [default4]:[2022-09-03 19:54:40,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt. [default4]:[2022-09-03 19:54:40,689] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt [default2]:[2022-09-03 19:54:40,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt. [default2]:[2022-09-03 19:54:40,763] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt [default3]:[2022-09-03 19:54:40,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt. [default3]:[2022-09-03 19:54:40,914] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt [default0]:[2022-09-03 19:54:41,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [default1]:[2022-09-03 19:54:41,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. [default1]:[2022-09-03 19:54:41,092] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt [default6]:[2022-09-03 19:54:41,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt. [default6]:[2022-09-03 19:54:41,151] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt [default5]:[2022-09-03 19:54:41,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt. [default5]:[2022-09-03 19:54:41,322] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt [default0]:[2022-09-03 19:54:41,316] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [default4]:[2022-09-03 19:54:41,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt. [default4]:[2022-09-03 19:54:41,880] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt [default2]:[2022-09-03 19:54:42,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt. [default2]:[2022-09-03 19:54:42,008] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt [default3]:[2022-09-03 19:54:42,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt. [default3]:[2022-09-03 19:54:42,352] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt [default0]:[2022-09-03 19:54:42,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt. [default0]:[2022-09-03 19:54:42,594] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt [default1]:[2022-09-03 19:54:42,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt. [default1]:[2022-09-03 19:54:42,722] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt [default2]:[2022-09-03 19:54:42,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt. [default2]:[2022-09-03 19:54:42,675] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt [default6]:[2022-09-03 19:54:42,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt. [default6]:[2022-09-03 19:54:42,742] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt [default3]:[2022-09-03 19:54:42,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt. [default3]:[2022-09-03 19:54:42,753] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt [default7]:[2022-09-03 19:54:42,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt. [default7]:[2022-09-03 19:54:42,948] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt [default7]:[2022-09-03 19:54:43,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt. [default7]:[2022-09-03 19:54:43,067] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt [default6]:[2022-09-03 19:54:42,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt. [default6]:[2022-09-03 19:54:42,962] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt [default0]:[2022-09-03 19:54:43,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt. [default0]:[2022-09-03 19:54:43,094] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt [default6]:[2022-09-03 19:54:43,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt. [default6]:[2022-09-03 19:54:43,156] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt [default7]:[2022-09-03 19:54:43,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt. [default7]:[2022-09-03 19:54:43,153] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt [default4]:[2022-09-03 19:54:43,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt. [default4]:[2022-09-03 19:54:43,109] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt [default3]:[2022-09-03 19:54:43,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt. [default3]:[2022-09-03 19:54:43,097] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt [default2]:[2022-09-03 19:54:43,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt. [default2]:[2022-09-03 19:54:43,121] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt [default0]:[2022-09-03 19:54:43,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt. [default0]:[2022-09-03 19:54:43,200] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt [default1]:[2022-09-03 19:54:43,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt. [default1]:[2022-09-03 19:54:43,206] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt [default2]:[2022-09-03 19:54:43,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt. [default2]:[2022-09-03 19:54:43,252] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt [default3]:[2022-09-03 19:54:43,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt. [default3]:[2022-09-03 19:54:43,278] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt [default3]:[2022-09-03 19:54:43,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt. [default3]:[2022-09-03 19:54:43,333] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt [default5]:[2022-09-03 19:54:43,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt. [default5]:[2022-09-03 19:54:43,312] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt [default4]:[2022-09-03 19:54:43,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt. [default4]:[2022-09-03 19:54:43,371] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt [default0]:[2022-09-03 19:54:43,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt. [default0]:[2022-09-03 19:54:43,394] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt [default5]:[2022-09-03 19:54:43,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt. [default5]:[2022-09-03 19:54:43,735] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt [default4]:[2022-09-03 19:54:43,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt. [default4]:[2022-09-03 19:54:43,673] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt [default6]:[2022-09-03 19:54:43,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt. [default6]:[2022-09-03 19:54:43,690] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt [default1]:[2022-09-03 19:54:43,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt. [default1]:[2022-09-03 19:54:43,728] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt [default0]:[2022-09-03 19:54:43,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt. [default0]:[2022-09-03 19:54:43,830] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt [default0]:[2022-09-03 19:54:43,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt. [default0]:[2022-09-03 19:54:43,961] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt [default3]:[2022-09-03 19:54:43,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt. [default3]:[2022-09-03 19:54:43,997] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt [default5]:[2022-09-03 19:54:44,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt. [default5]:[2022-09-03 19:54:44,361] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt [default2]:[2022-09-03 19:54:44,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt. [default2]:[2022-09-03 19:54:44,379] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt [default2]:[2022-09-03 19:54:44,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt. [default2]:[2022-09-03 19:54:44,692] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt [default2]:[2022-09-03 19:54:44,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt. [default2]:[2022-09-03 19:54:44,784] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt [default5]:[2022-09-03 19:54:44,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt. [default5]:[2022-09-03 19:54:44,781] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt [default1]:[2022-09-03 19:54:44,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt. [default1]:[2022-09-03 19:54:44,841] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt [default4]:[2022-09-03 19:54:44,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt. [default4]:[2022-09-03 19:54:44,891] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt [default3]:[2022-09-03 19:54:44,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt. [default3]:[2022-09-03 19:54:44,977] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt [default5]:[2022-09-03 19:54:45,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt. [default5]:[2022-09-03 19:54:45,004] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt [default6]:[2022-09-03 19:54:45,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt. [default6]:[2022-09-03 19:54:45,048] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt [default1]:[2022-09-03 19:54:45,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt. [default1]:[2022-09-03 19:54:45,130] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt [default2]:[2022-09-03 19:54:45,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt. [default2]:[2022-09-03 19:54:45,215] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt [default5]:[2022-09-03 19:54:45,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt. [default5]:[2022-09-03 19:54:45,359] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt [default7]:[2022-09-03 19:54:45,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt. [default7]:[2022-09-03 19:54:45,420] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt [default2]:[2022-09-03 19:54:45,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt. [default2]:[2022-09-03 19:54:45,628] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt [default6]:[2022-09-03 19:54:45,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt. [default6]:[2022-09-03 19:54:45,649] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt [default7]:[2022-09-03 19:54:45,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt. [default7]:[2022-09-03 19:54:45,755] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt [default4]:[2022-09-03 19:54:45,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt. [default4]:[2022-09-03 19:54:45,696] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt [default7]:[2022-09-03 19:54:45,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt. [default7]:[2022-09-03 19:54:45,773] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt [default6]:[2022-09-03 19:54:45,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt. [default6]:[2022-09-03 19:54:45,852] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt [default2]:[2022-09-03 19:54:45,921] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt. [default2]:[2022-09-03 19:54:45,921] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt [default4]:[2022-09-03 19:54:45,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt. [default4]:[2022-09-03 19:54:45,964] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt [default0]:[2022-09-03 19:54:45,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt. [default0]:[2022-09-03 19:54:45,988] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt [default6]:[2022-09-03 19:54:46,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt. [default6]:[2022-09-03 19:54:46,040] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt [default7]:[2022-09-03 19:54:46,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt. [default7]:[2022-09-03 19:54:46,034] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt [default4]:[2022-09-03 19:54:46,061] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt. [default4]:[2022-09-03 19:54:46,062] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt [default0]:[2022-09-03 19:54:46,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt. [default0]:[2022-09-03 19:54:46,123] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt [default6]:[2022-09-03 19:54:46,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt. [default6]:[2022-09-03 19:54:46,367] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt [default3]:[2022-09-03 19:54:46,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt. [default3]:[2022-09-03 19:54:46,438] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt [default1]:[2022-09-03 19:54:46,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt. [default1]:[2022-09-03 19:54:46,525] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt [default4]:[2022-09-03 19:54:46,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt. [default4]:[2022-09-03 19:54:46,510] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt [default0]:[2022-09-03 19:54:46,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt. [default0]:[2022-09-03 19:54:46,350] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt [default1]:[2022-09-03 19:54:46,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt. [default1]:[2022-09-03 19:54:46,526] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt [default1]:[2022-09-03 19:54:46,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt. [default1]:[2022-09-03 19:54:46,499] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt [default0]:[2022-09-03 19:54:46,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt. [default0]:[2022-09-03 19:54:46,667] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt [default5]:[2022-09-03 19:54:46,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt. [default5]:[2022-09-03 19:54:46,706] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt [default2]:[2022-09-03 19:54:46,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt. [default2]:[2022-09-03 19:54:46,706] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt [default3]:[2022-09-03 19:54:46,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt. [default3]:[2022-09-03 19:54:46,693] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt [default3]:[2022-09-03 19:54:46,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt. [default3]:[2022-09-03 19:54:46,780] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt [default4]:[2022-09-03 19:54:47,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt. [default4]:[2022-09-03 19:54:47,130] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt [default7]:[2022-09-03 19:54:47,085] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt. [default7]:[2022-09-03 19:54:47,085] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt [default5]:[2022-09-03 19:54:47,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt. [default5]:[2022-09-03 19:54:47,150] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt [default2]:[2022-09-03 19:54:47,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt. [default2]:[2022-09-03 19:54:47,195] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt [default5]:[2022-09-03 19:54:47,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt. [default5]:[2022-09-03 19:54:47,238] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt [default1]:[2022-09-03 19:54:47,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt. [default1]:[2022-09-03 19:54:47,199] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt [default5]:[2022-09-03 19:54:47,233] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt. [default5]:[2022-09-03 19:54:47,233] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt [default3]:[2022-09-03 19:54:47,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt. [default3]:[2022-09-03 19:54:47,320] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt [default6]:[2022-09-03 19:54:47,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt. [default6]:[2022-09-03 19:54:47,336] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt [default2]:[2022-09-03 19:54:47,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt. [default2]:[2022-09-03 19:54:47,368] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt [default6]:[2022-09-03 19:54:47,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt. [default6]:[2022-09-03 19:54:47,426] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt [default7]:[2022-09-03 19:54:47,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt. [default7]:[2022-09-03 19:54:47,400] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt [default7]:[2022-09-03 19:54:47,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt. [default7]:[2022-09-03 19:54:47,435] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt [default7]:[2022-09-03 19:54:47,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt. [default7]:[2022-09-03 19:54:47,530] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt [default6]:[2022-09-03 19:54:47,759] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt. [default6]:[2022-09-03 19:54:47,759] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt [default7]:[2022-09-03 19:54:47,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt. [default7]:[2022-09-03 19:54:47,860] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt [default4]:[2022-09-03 19:54:47,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt. [default4]:[2022-09-03 19:54:47,975] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt [default0]:[2022-09-03 19:54:48,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt. [default0]:[2022-09-03 19:54:48,019] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt [default1]:[2022-09-03 19:54:48,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt. [default1]:[2022-09-03 19:54:48,230] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt [default6]:[2022-09-03 19:54:48,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt. [default6]:[2022-09-03 19:54:48,292] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt [default4]:[2022-09-03 19:54:48,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt. [default4]:[2022-09-03 19:54:48,276] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt [default0]:[2022-09-03 19:54:48,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt. [default0]:[2022-09-03 19:54:48,325] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt [default5]:[2022-09-03 19:54:48,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt. [default5]:[2022-09-03 19:54:48,295] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt [default1]:[2022-09-03 19:54:48,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt. [default1]:[2022-09-03 19:54:48,429] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt [default1]:[2022-09-03 19:54:48,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt. [default1]:[2022-09-03 19:54:48,366] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt [default4]:[2022-09-03 19:54:48,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt. [default4]:[2022-09-03 19:54:48,477] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt [default7]:[2022-09-03 19:54:48,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt. [default7]:[2022-09-03 19:54:48,633] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt [default0]:[2022-09-03 19:54:48,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt. [default0]:[2022-09-03 19:54:48,604] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt [default5]:[2022-09-03 19:54:48,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt. [default5]:[2022-09-03 19:54:48,637] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt [default1]:[2022-09-03 19:54:48,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt. [default1]:[2022-09-03 19:54:48,699] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt [default7]:[2022-09-03 19:54:48,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt. [default7]:[2022-09-03 19:54:48,859] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt [default6]:[2022-09-03 19:54:48,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt. [default6]:[2022-09-03 19:54:48,904] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt [default3]:[2022-09-03 19:54:48,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt. [default3]:[2022-09-03 19:54:48,978] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt [default0]:[2022-09-03 19:54:48,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt. [default0]:[2022-09-03 19:54:48,969] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt [default1]:[2022-09-03 19:54:49,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt. [default1]:[2022-09-03 19:54:49,068] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt [default3]:[2022-09-03 19:54:49,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt. [default3]:[2022-09-03 19:54:49,154] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt [default1]:[2022-09-03 19:54:49,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt. [default1]:[2022-09-03 19:54:49,160] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt [default2]:[2022-09-03 19:54:49,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt. [default2]:[2022-09-03 19:54:49,172] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt [default0]:[2022-09-03 19:54:49,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt. [default0]:[2022-09-03 19:54:49,433] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt [default0]:[2022-09-03 19:54:49,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt. [default0]:[2022-09-03 19:54:49,860] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt [default5]:[2022-09-03 19:54:50,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt. [default5]:[2022-09-03 19:54:50,139] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt [default6]:[2022-09-03 19:54:50,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt. [default6]:[2022-09-03 19:54:50,180] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt [default4]:[2022-09-03 19:54:50,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt. [default4]:[2022-09-03 19:54:50,198] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt [default7]:[2022-09-03 19:54:50,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt. [default7]:[2022-09-03 19:54:50,236] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt [default2]:[2022-09-03 19:54:50,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt. [default2]:[2022-09-03 19:54:50,274] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt [default3]:[2022-09-03 19:54:50,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt. [default3]:[2022-09-03 19:54:50,409] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt [default7]:[2022-09-03 19:54:50,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt. [default7]:[2022-09-03 19:54:50,510] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt [default1]:[2022-09-03 19:54:50,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt. [default1]:[2022-09-03 19:54:50,515] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt [default6]:[2022-09-03 19:54:50,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt. [default6]:[2022-09-03 19:54:50,509] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt [default1]:[2022-09-03 19:54:50,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt. [default1]:[2022-09-03 19:54:50,973] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt [default5]:[2022-09-03 19:54:51,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt. [default5]:[2022-09-03 19:54:51,345] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt [default4]:[2022-09-03 19:54:51,347] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt. [default4]:[2022-09-03 19:54:51,347] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt [default3]:[2022-09-03 19:54:51,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt. [default3]:[2022-09-03 19:54:51,978] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt [default2]:[2022-09-03 19:54:54,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt. [default2]:[2022-09-03 19:54:54,363] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt [default5]:[2022-09-03 19:54:54,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt. [default5]:[2022-09-03 19:54:54,558] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt [default4]:[2022-09-03 19:54:54,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt. [default4]:[2022-09-03 19:54:54,577] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt [default0]:[2022-09-03 19:54:55,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt. [default0]:[2022-09-03 19:54:55,302] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt [default7]:[2022-09-03 19:54:55,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt. [default7]:[2022-09-03 19:54:55,795] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt [default6]:[2022-09-03 19:54:56,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt. [default6]:[2022-09-03 19:54:56,936] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt [default5]:[2022-09-03 19:54:57,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt. [default5]:[2022-09-03 19:54:57,075] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt. [default4]:[2022-09-03 19:54:57,577] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]: successfully saved checkpoint at iteration 5 to /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:time (ms) | save-checkpoint: 37562.63 [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default5]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default4]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default2]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default1]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default3]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default6]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default0]:[2022-09-03 19:54:57,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5 is ready now! [default7]: iteration 6/ 3100 | consumed samples: 12288 | consumed tokens: 25165824 | elapsed time per iteration (s): 178.87 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 1.771013E+00 | grad norm: 13.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 11.450 | TFLOPs: 116.88 | srun: Job step aborted: Waiting up to 62 seconds for job step to finish. WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3635356 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3635357 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1894381 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3023576 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1894382 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3023577 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1983845 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3635358 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2020274 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3611427 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3635359 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1983846 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2138837 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2231227 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1964177 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2020275 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 251884 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1894383 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3635360 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3611428 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3023578 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516098 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2934574 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2231228 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516977 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 423378 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3635361 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2138838 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 251885 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1964178 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1446240 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516978 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516099 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3595931 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1983847 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3635362 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2934575 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3958048 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 423379 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2020276 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3611429 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1974822 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1555396 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3635363 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516100 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1803694 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2640760 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3958049 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2138839 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1446241 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2231229 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 251886 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1974823 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516979 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1964179 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 373755 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516101 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1555397 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2934576 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 423380 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516102 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3156170 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3787649 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1894384 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2640761 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1983848 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 373756 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3958050 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3595932 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1446242 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1377149 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516103 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2020277 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3023579 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3611430 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1322747 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3156171 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3787650 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516980 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1983849 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1974824 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1555398 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3958051 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516104 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 251887 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2934577 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2020278 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3023580 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1322748 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 423381 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3156172 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1983850 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3787651 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2640762 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1717896 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516981 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516105 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 373757 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1377150 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3958052 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1894385 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1974825 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2673410 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 251888 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2020279 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3023581 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1983851 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3156173 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3787652 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3595933 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2640763 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3611431 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516982 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3595934 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1964180 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3958053 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3023582 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1894386 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1717897 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2231230 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1583568 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2673411 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1983852 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 251889 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2020280 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3156174 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1555399 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3787653 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 930967 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516983 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3958054 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2138840 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2934578 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1894387 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2640764 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 423382 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3023583 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1964181 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 373758 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3917693 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1377151 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3611432 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2020281 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3595935 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 251890 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3156175 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1555400 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3958055 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3787654 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1583569 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2934579 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1894388 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 516985 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 930968 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1446243 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2138841 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2640765 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1964182 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3611433 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3917694 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3156176 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1974826 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2984780 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1555401 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3595936 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3043704 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1803695 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 251891 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1717898 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3787655 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1781530 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2673412 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1322749 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1446244 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2138842 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1964183 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2640766 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3611434 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2934580 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2231231 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3156177 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1974827 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1555402 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3787656 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1803696 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3595937 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1377152 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2138843 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2984781 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1446245 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 423383 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2934581 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3043705 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1964184 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2640767 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2231232 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1555403 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1781531 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1974828 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 930969 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1803697 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 423384 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3595938 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1446246 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1377153 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2984782 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2138844 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3043706 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3917695 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1974829 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2231233 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 373759 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1322751 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 423385 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1803698 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1377154 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3043707 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1446247 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2984783 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1717899 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2231234 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1377155 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1803699 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3043708 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2984784 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1583570 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1717900 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1781532 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 373760 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1377156 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1803700 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1717901 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3043709 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 930970 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2984785 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2673413 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1583571 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3917696 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3043710 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1803701 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1717902 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2984786 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1583572 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 930971 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1322752 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3917697 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1717903 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3043711 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2984787 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1583573 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 930972 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3917698 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1583574 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 930973 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1322753 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2673414 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1583575 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 930974 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 373761 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1781533 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3917699 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1322754 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3917700 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1781534 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 373762 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1322755 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1781535 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1781536 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2673415 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1781537 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2673416 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2673417 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 411447 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 411448 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 411449 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 411450 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 411451 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 411452 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 411453 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 411454 closing signal SIGTERM slurmstepd: error: *** STEP 927375.0 ON jean-zay-iam02 CANCELLED AT 2022-09-03T19:57:26 *** WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3639084 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3639085 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3639086 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3639087 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3639088 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3639089 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3639091 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3639092 closing signal SIGTERM Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1894299 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3639041 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run main() run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper return _run_code(code, main_globals, None, elastic_launch( result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1717854 got signal: 15 elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 251846 got signal: 15 elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2934526 got signal: 15 result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3917649 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper main() result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 516059 got signal: 15 run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 516939 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1446201 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1803650 got signal: 15 return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2984698 got signal: 15 run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 423339 got signal: 15 exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2640711 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1781488 got signal: 15 return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper time.sleep(monitor_interval) time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1555352 got signal: 15 torch.distributed.elastic.multiprocessing.api.SignalException: Process 1983802 got signal: 15 run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3635310 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 411404 got signal: 15 raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) time.sleep(monitor_interval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2673371 got signal: 15 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3156132 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3958007 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper elastic_launch( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1377111 got signal: 15 result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3023533 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1583526 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code result = f(*args, **kwargs) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3787600 got signal: 15 return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1974783 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2231183 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2020236 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1964094 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper run(args) result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run elastic_launch( raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ torch.distributed.elastic.multiprocessing.api.SignalException: Process 373717 got signal: 15 time.sleep(monitor_interval) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3611379 got signal: 15 result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3595891 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 930927 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2138793 got signal: 15 return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3043661 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1322707 got signal: 15 WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** [default0]:Offline mode: forcing local_files_only=True [default0]:Offline mode: forcing local_files_only=True [default0]:using world size: 288, data-parallel-size: 4, tensor-model-parallel size: 1, pipeline-model-parallel size: 72 [default0]:accumulate and all-reduce gradients in fp32 for bfloat16 data type. [default0]:using torch.bfloat16 for parameters ... [default0]:------------------------ arguments ------------------------ [default0]: abort_on_unmet_fused_kernel_constraints ......... True [default0]: accumulate_allreduce_grads_in_fp32 .............. True [default0]: adam_beta1 ...................................... 0.9 [default0]: adam_beta2 ...................................... 0.95 [default0]: adam_eps ........................................ 1e-08 [default0]: adlr_autoresume ................................. False [default0]: adlr_autoresume_interval ........................ 1000 [default0]: apply_query_key_layer_scaling ................... True [default0]: apply_residual_connection_post_layernorm ........ False [default0]: attention_dropout ............................... 0.1 [default0]: attention_softmax_in_fp32 ....................... False [default0]: bert_binary_head ................................ True [default0]: bert_load ....................................... None [default0]: bf16 ............................................ True [default0]: bias_dropout_fusion ............................. True [default0]: bias_gelu_fusion ................................ True [default0]: biencoder_projection_dim ........................ 0 [default0]: biencoder_shared_query_context_model ............ False [default0]: block_data_path ................................. None [default0]: checkpoint_activations .......................... True [default0]: checkpoint_in_cpu ............................... False [default0]: checkpoint_num_layers ........................... 1 [default0]: clip_grad ....................................... 1.0 [default0]: codecarbon_dir .................................. None [default0]: consumed_train_samples .......................... 0 [default0]: consumed_train_tokens ........................... 0 [default0]: consumed_valid_samples .......................... 0 [default0]: contigious_checkpointing ........................ False [default0]: cpu_optimizer ................................... False [default0]: cpu_torch_adam .................................. False [default0]: curriculum_learning ............................. False [default0]: data_impl ....................................... mmap [default0]: data_parallel_size .............................. 4 [default0]: data_path ....................................... None [default0]: dataloader_type ................................. single [default0]: DDP_impl ........................................ local [default0]: decoder_seq_length .............................. None [default0]: deepscale ....................................... False [default0]: deepscale_config ................................ None [default0]: deepspeed ....................................... True [default0]: deepspeed_activation_checkpointing .............. True [default0]: deepspeed_config ................................ ./ds_config.927431.json [default0]: deepspeed_mpi ................................... False [default0]: distribute_checkpointed_activations ............. False [default0]: distributed_backend ............................. nccl [default0]: embed_layernorm ................................. True [default0]: embedding_path .................................. None [default0]: encoder_seq_length .............................. 2048 [default0]: eod_mask_loss ................................... False [default0]: eval_interval ................................... 250 [default0]: eval_iters ...................................... 1 [default0]: eval_only ....................................... None [default0]: evidence_data_path .............................. None [default0]: exit_duration_in_mins ........................... 5990 [default0]: exit_interval ................................... None [default0]: ffn_hidden_size ................................. 57344 [default0]: finetune ........................................ False [default0]: fp16 ............................................ False [default0]: fp16_lm_cross_entropy ........................... False [default0]: fp32_residual_connection ........................ False [default0]: gigaflos_no_embeds .............................. 0 [default0]: global_batch_size ............................... 2048 [default0]: glu_activation .................................. None [default0]: hidden_dropout .................................. 0.1 [default0]: hidden_size ..................................... 14336 [default0]: hysteresis ...................................... 2 [default0]: ict_head_size ................................... None [default0]: ict_load ........................................ None [default0]: img_dim ......................................... 224 [default0]: indexer_batch_size .............................. 128 [default0]: indexer_log_interval ............................ 1000 [default0]: inference ....................................... False [default0]: init_method_std ................................. 0.0048 [default0]: init_method_xavier_uniform ...................... False [default0]: initial_loss_scale .............................. 4294967296 [default0]: kill_switch_path ................................ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/kill-switch-tr13-176B-mtf [default0]: kv_channels ..................................... 128 [default0]: layernorm_epsilon ............................... 1e-05 [default0]: lazy_mpu_init ................................... None [default0]: load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: local_rank ...................................... None [default0]: log_batch_size_to_tensorboard ................... True [default0]: log_interval .................................... 1 [default0]: log_learning_rate_to_tensorboard ................ True [default0]: log_level ....................................... None [default0]: log_level_replica ............................... None [default0]: log_loss_scale_to_tensorboard ................... True [default0]: log_num_zeros_in_grad ........................... False [default0]: log_params_norm ................................. False [default0]: log_path ........................................ None [default0]: log_timers_to_tensorboard ....................... True [default0]: log_validation_ppl_to_tensorboard ............... True [default0]: loss_on_targets_only ............................ False [default0]: loss_scale ...................................... None [default0]: loss_scale_window ............................... 1000 [default0]: lr .............................................. 2e-05 [default0]: lr_decay_iters .................................. None [default0]: lr_decay_samples ................................ None [default0]: lr_decay_style .................................. constant [default0]: lr_decay_tokens ................................. None [default0]: lr_warmup_fraction .............................. None [default0]: lr_warmup_iters ................................. 0 [default0]: lr_warmup_samples ............................... 0 [default0]: make_vocab_size_divisible_by .................... 128 [default0]: mask_prob ....................................... 0.15 [default0]: masked_softmax_fusion ........................... True [default0]: max_position_embeddings ......................... 2048 [default0]: mean_noise_span_length .......................... None [default0]: memory_centric_tiled_linear ..................... False [default0]: merge_file ...................................... None [default0]: micro_batch_size ................................ 1 [default0]: min_loss_scale .................................. 1.0 [default0]: min_lr .......................................... 0.0 [default0]: mmap_warmup ..................................... False [default0]: no_load_optim ................................... None [default0]: no_load_rng ..................................... None [default0]: no_save_optim ................................... None [default0]: no_save_rng ..................................... None [default0]: noise_density ................................... None [default0]: norm_target_loss ................................ True [default0]: num_attention_heads ............................. 112 [default0]: num_channels .................................... 3 [default0]: num_classes ..................................... 1000 [default0]: num_layers ...................................... 70 [default0]: num_layers_per_virtual_pipeline_stage ........... None [default0]: num_workers ..................................... 2 [default0]: onnx_safe ....................................... None [default0]: openai_gelu ..................................... False [default0]: optimizer ....................................... adam [default0]: override_lr_scheduler ........................... False [default0]: pad_vocab_size_to ............................... 250880 [default0]: params_dtype .................................... torch.bfloat16 [default0]: partition_activations ........................... False [default0]: patch_dim ....................................... 16 [default0]: pipeline_model_parallel_size .................... 72 [default0]: position_embedding_type ......................... PositionEmbeddingType.alibi [default0]: pp_partition_method ............................. type:transformer|embedding [default0]: prefixlm ........................................ False [default0]: profile_backward ................................ False [default0]: query_in_block_prob ............................. 0.1 [default0]: rampup_batch_size ............................... None [default0]: rank ............................................ 0 [default0]: remote_device ................................... none [default0]: reset_attention_mask ............................ False [default0]: reset_position_ids .............................. False [default0]: reset_progress .................................. None [default0]: retriever_report_topk_accuracies ................ [] [default0]: retriever_score_scaling ......................... False [default0]: retriever_seq_length ............................ 256 [default0]: reweight_loss_based_on_position_frequency ....... False [default0]: sample_rate ..................................... 1.0 [default0]: save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: save_interval ................................... 249 [default0]: scatter_gather_tensors_in_pipeline .............. True [default0]: scattered_embeddings ............................ False [default0]: seed ............................................ 42 [default0]: seq_length ...................................... 2048 [default0]: sgd_momentum .................................... 0.9 [default0]: short_seq_prob .................................. 0.1 [default0]: skip_train_iteration_range ...................... None [default0]: split ........................................... None [default0]: split_transformers .............................. False [default0]: sync_tp_duplicated_parameters ................... True [default0]: synchronize_each_layer .......................... False [default0]: tensor_model_parallel_size ...................... 1 [default0]: tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/tr13-176B-ml-t0-logs/tensorboard/p31lossseq [default0]: tensorboard_log_interval ........................ 1 [default0]: tensorboard_queue_size .......................... 5 [default0]: test_weighted_split_paths ....................... None [default0]: test_weighted_split_paths_path .................. None [default0]: tile_factor ..................................... 1 [default0]: titles_data_path ................................ None [default0]: tokenizer_name_or_path .......................... bigscience/tokenizer [default0]: tokenizer_type .................................. PretrainedFromHF [default0]: train_iters ..................................... None [default0]: train_samples ................................... 6348800 [default0]: train_tokens .................................... None [default0]: train_weighted_split_names ...................... ['train'] [default0]: train_weighted_split_paths ...................... [['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train']] [default0]: train_weighted_split_paths_path ................. None [default0]: train_weighted_split_splits ..................... [['0:1']] [default0]: train_weighted_split_weights .................... [['1']] [default0]: universal_checkpoint ............................ False [default0]: use_bnb_optimizer ............................... False [default0]: use_checkpoint_lr_scheduler ..................... False [default0]: use_contiguous_buffers_in_ddp ................... True [default0]: use_cpu_initialization .......................... None [default0]: use_one_sent_docs ............................... False [default0]: use_pin_memory .................................. False [default0]: valid_num_workers ............................... 2 [default0]: valid_weighted_split_names ...................... ['validation_pretraining', 'valid'] [default0]: valid_weighted_split_paths ...................... [['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation']] [default0]: valid_weighted_split_paths_path ................. None [default0]: valid_weighted_split_splits ..................... [['0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0'], ['0:1']] [default0]: valid_weighted_split_weights .................... [['0.0330676168743166', '0.011242051312222764', '0.13027200903379185', '0.22171164529099704', '0.10667815627928671', '0.0015595123898173287', '0.13054018439603915', '0.01091803753667153', '0.00011021422347108609', '0.005492381453597748', '0.0004021215011318779', '0.007470068593492175', '0.0006190467776576425', '0.0010335296343329384', '0.0005012010684646179', '0.0006672772956128299', '0.00035928138344705506', '0.0005084433130291778', '0.0021137328219915496', '0.0009129946225980253', '0.0012454301613725426', '0.00031588689199263235', '0.08137213783015229', '0.055293935695898196', '0.04954150576361177', '0.02461641286531197', '0.12091748245519074', '0.0005177025345001541'], ['1']] [default0]: virtual_pipeline_model_parallel_size ............ None [default0]: vocab_extra_ids ................................. 0 [default0]: vocab_file ...................................... None [default0]: weight_decay .................................... 0.0001 [default0]: world_size ...................................... 288 [default0]: zero_allgather_bucket_size ...................... 0.0 [default0]: zero_contigious_gradients ....................... False [default0]: zero_reduce_bucket_size ......................... 0.0 [default0]: zero_reduce_scatter ............................. False [default0]: zero_stage ...................................... 0 [default0]:-------------------- end of arguments --------------------- [default0]:setting number of micro-batches to constant 512 [default0]:> building PretrainedFromHF tokenizer ... [default0]: vocab file is un-used. loading tokenizer from pre-trained model [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer.json from cache at /gpfswork/rech/six/commun/models/29d0a41f4527257b8afe6d5495f492dac260318430f18239a42ca5f6dc4487fc.7b0fb8edc2986944ff9b7418149b52d8c4a1354a17d0360deb8974da70c6cc03 [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/added_tokens.json from cache at None [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/special_tokens_map.json from cache at /gpfswork/rech/six/commun/models/4f03e43bcc54e0721823e6a06b1d197905e2ea79aa7dcc1a0f0fcecc73ce3fb2.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer_config.json from cache at /gpfswork/rech/six/commun/models/9441c67b923ef7a65950a64e31c40f80ed181ba59502981a80f2cd0c438c6432.3c09887250243e50d8de9d10b2a778152434f62a22a95b5f89dbbe79a6eb496a [default0]: > padded vocab (size: 250680) with 200 dummy tokens (new size: 250880) [default0]:DeepSpeed general environment info: [default0]:torch install path ............... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch'] [default0]:torch version .................... 1.12.0 [default0]:torch cuda version ............... 11.3 [default0]:torch hip version ................ None [default0]:nvcc version ..................... 11.4 [default0]:deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed'] [default0]:deepspeed info ................... 0.7.1+8b2a6371, 8b2a6371, master [default0]:deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3 [default0]:**** Git info for Megatron: git_hash=6c1018f git_branch=mtf-multival **** [default0]:> initializing torch distributed ... [default0]:[2022-09-03 19:58:07,427] [INFO] [comm.py:628:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [default7]:> setting tensorboard ... [default0]:> initializing tensor model parallel with size 1 [default0]:> initializing pipeline model parallel with size 72 [default0]:> setting random seeds to 42 ... [default0]:[2022-09-03 19:58:14,164] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 [default0]:> compiling dataset index builder ... [default0]:make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:make: Nothing to be done for 'default'. [default0]:make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:>>> done with dataset index builder. Compilation time: 0.093 seconds [default0]:> compiling and loading fused kernels ... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module fused_mix_prec_layer_norm_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module fused_mix_prec_layer_norm_cuda... [default0]:>>> done with compiling and loading fused kernels. Compilation time: 7.006 seconds [default0]:time to initialize megatron (seconds): 61.265 [default0]:[after megatron is initialized] datetime: 2022-09-03 19:58:21 [default0]:building GPT model ... [default0]:[2022-09-03 19:58:21,312] [INFO] [utils.py:827:see_memory_usage] Before Building Model [default0]:[2022-09-03 19:58:21,313] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [default0]:[2022-09-03 19:58:21,313] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.09 GB, percent = 7.2% [default0]:SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None [default0]:Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=1, model=0): 5, ProcessCoord(pipe=1, data=2, model=0): 6, ProcessCoord(pipe=1, data=3, model=0): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=1, model=0): 9, ProcessCoord(pipe=2, data=2, model=0): 10, ProcessCoord(pipe=2, data=3, model=0): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=1, model=0): 13, ProcessCoord(pipe=3, data=2, model=0): 14, ProcessCoord(pipe=3, data=3, model=0): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=1, model=0): 17, ProcessCoord(pipe=4, data=2, model=0): 18, ProcessCoord(pipe=4, data=3, model=0): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=1, model=0): 21, ProcessCoord(pipe=5, data=2, model=0): 22, ProcessCoord(pipe=5, data=3, model=0): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=1, model=0): 25, ProcessCoord(pipe=6, data=2, model=0): 26, ProcessCoord(pipe=6, data=3, model=0): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=1, model=0): 29, ProcessCoord(pipe=7, data=2, model=0): 30, ProcessCoord(pipe=7, data=3, model=0): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=1, model=0): 33, ProcessCoord(pipe=8, data=2, model=0): 34, ProcessCoord(pipe=8, data=3, model=0): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=1, model=0): 37, ProcessCoord(pipe=9, data=2, model=0): 38, ProcessCoord(pipe=9, data=3, model=0): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=1, model=0): 41, ProcessCoord(pipe=10, data=2, model=0): 42, ProcessCoord(pipe=10, data=3, model=0): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=1, model=0): 45, ProcessCoord(pipe=11, data=2, model=0): 46, ProcessCoord(pipe=11, data=3, model=0): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=1, model=0): 49, ProcessCoord(pipe=12, data=2, model=0): 50, ProcessCoord(pipe=12, data=3, model=0): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=1, model=0): 53, ProcessCoord(pipe=13, data=2, model=0): 54, ProcessCoord(pipe=13, data=3, model=0): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=1, model=0): 57, ProcessCoord(pipe=14, data=2, model=0): 58, ProcessCoord(pipe=14, data=3, model=0): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=1, model=0): 61, ProcessCoord(pipe=15, data=2, model=0): 62, ProcessCoord(pipe=15, data=3, model=0): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=1, model=0): 65, ProcessCoord(pipe=16, data=2, model=0): 66, ProcessCoord(pipe=16, data=3, model=0): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=1, model=0): 69, ProcessCoord(pipe=17, data=2, model=0): 70, ProcessCoord(pipe=17, data=3, model=0): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=1, model=0): 73, ProcessCoord(pipe=18, data=2, model=0): 74, ProcessCoord(pipe=18, data=3, model=0): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=1, model=0): 77, ProcessCoord(pipe=19, data=2, model=0): 78, ProcessCoord(pipe=19, data=3, model=0): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=1, model=0): 81, ProcessCoord(pipe=20, data=2, model=0): 82, ProcessCoord(pipe=20, data=3, model=0): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=1, model=0): 85, ProcessCoord(pipe=21, data=2, model=0): 86, ProcessCoord(pipe=21, data=3, model=0): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=1, model=0): 89, ProcessCoord(pipe=22, data=2, model=0): 90, ProcessCoord(pipe=22, data=3, model=0): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=1, model=0): 93, ProcessCoord(pipe=23, data=2, model=0): 94, ProcessCoord(pipe=23, data=3, model=0): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=1, model=0): 97, ProcessCoord(pipe=24, data=2, model=0): 98, ProcessCoord(pipe=24, data=3, model=0): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=1, model=0): 101, ProcessCoord(pipe=25, data=2, model=0): 102, ProcessCoord(pipe=25, data=3, model=0): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=1, model=0): 105, ProcessCoord(pipe=26, data=2, model=0): 106, ProcessCoord(pipe=26, data=3, model=0): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=1, model=0): 109, ProcessCoord(pipe=27, data=2, model=0): 110, ProcessCoord(pipe=27, data=3, model=0): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=1, model=0): 113, ProcessCoord(pipe=28, data=2, model=0): 114, ProcessCoord(pipe=28, data=3, model=0): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=1, model=0): 117, ProcessCoord(pipe=29, data=2, model=0): 118, ProcessCoord(pipe=29, data=3, model=0): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=1, model=0): 121, ProcessCoord(pipe=30, data=2, model=0): 122, ProcessCoord(pipe=30, data=3, model=0): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=1, model=0): 125, ProcessCoord(pipe=31, data=2, model=0): 126, ProcessCoord(pipe=31, data=3, model=0): 127, ProcessCoord(pipe=32, data=0, model=0): 128, ProcessCoord(pipe=32, data=1, model=0): 129, ProcessCoord(pipe=32, data=2, model=0): 130, ProcessCoord(pipe=32, data=3, model=0): 131, ProcessCoord(pipe=33, data=0, model=0): 132, ProcessCoord(pipe=33, data=1, model=0): 133, ProcessCoord(pipe=33, data=2, model=0): 134, ProcessCoord(pipe=33, data=3, model=0): 135, ProcessCoord(pipe=34, data=0, model=0): 136, ProcessCoord(pipe=34, data=1, model=0): 137, ProcessCoord(pipe=34, data=2, model=0): 138, ProcessCoord(pipe=34, data=3, model=0): 139, ProcessCoord(pipe=35, data=0, model=0): 140, ProcessCoord(pipe=35, data=1, model=0): 141, ProcessCoord(pipe=35, data=2, model=0): 142, ProcessCoord(pipe=35, data=3, model=0): 143, ProcessCoord(pipe=36, data=0, model=0): 144, ProcessCoord(pipe=36, data=1, model=0): 145, ProcessCoord(pipe=36, data=2, model=0): 146, ProcessCoord(pipe=36, data=3, model=0): 147, ProcessCoord(pipe=37, data=0, model=0): 148, ProcessCoord(pipe=37, data=1, model=0): 149, ProcessCoord(pipe=37, data=2, model=0): 150, ProcessCoord(pipe=37, data=3, model=0): 151, ProcessCoord(pipe=38, data=0, model=0): 152, ProcessCoord(pipe=38, data=1, model=0): 153, ProcessCoord(pipe=38, data=2, model=0): 154, ProcessCoord(pipe=38, data=3, model=0): 155, ProcessCoord(pipe=39, data=0, model=0): 156, ProcessCoord(pipe=39, data=1, model=0): 157, ProcessCoord(pipe=39, data=2, model=0): 158, ProcessCoord(pipe=39, data=3, model=0): 159, ProcessCoord(pipe=40, data=0, model=0): 160, ProcessCoord(pipe=40, data=1, model=0): 161, ProcessCoord(pipe=40, data=2, model=0): 162, ProcessCoord(pipe=40, data=3, model=0): 163, ProcessCoord(pipe=41, data=0, model=0): 164, ProcessCoord(pipe=41, data=1, model=0): 165, ProcessCoord(pipe=41, data=2, model=0): 166, ProcessCoord(pipe=41, data=3, model=0): 167, ProcessCoord(pipe=42, data=0, model=0): 168, ProcessCoord(pipe=42, data=1, model=0): 169, ProcessCoord(pipe=42, data=2, model=0): 170, ProcessCoord(pipe=42, data=3, model=0): 171, ProcessCoord(pipe=43, data=0, model=0): 172, ProcessCoord(pipe=43, data=1, model=0): 173, ProcessCoord(pipe=43, data=2, model=0): 174, ProcessCoord(pipe=43, data=3, model=0): 175, ProcessCoord(pipe=44, data=0, model=0): 176, ProcessCoord(pipe=44, data=1, model=0): 177, ProcessCoord(pipe=44, data=2, model=0): 178, ProcessCoord(pipe=44, data=3, model=0): 179, ProcessCoord(pipe=45, data=0, model=0): 180, ProcessCoord(pipe=45, data=1, model=0): 181, ProcessCoord(pipe=45, data=2, model=0): 182, ProcessCoord(pipe=45, data=3, model=0): 183, ProcessCoord(pipe=46, data=0, model=0): 184, ProcessCoord(pipe=46, data=1, model=0): 185, ProcessCoord(pipe=46, data=2, model=0): 186, ProcessCoord(pipe=46, data=3, model=0): 187, ProcessCoord(pipe=47, data=0, model=0): 188, ProcessCoord(pipe=47, data=1, model=0): 189, ProcessCoord(pipe=47, data=2, model=0): 190, ProcessCoord(pipe=47, data=3, model=0): 191, ProcessCoord(pipe=48, data=0, model=0): 192, ProcessCoord(pipe=48, data=1, model=0): 193, ProcessCoord(pipe=48, data=2, model=0): 194, ProcessCoord(pipe=48, data=3, model=0): 195, ProcessCoord(pipe=49, data=0, model=0): 196, ProcessCoord(pipe=49, data=1, model=0): 197, ProcessCoord(pipe=49, data=2, model=0): 198, ProcessCoord(pipe=49, data=3, model=0): 199, ProcessCoord(pipe=50, data=0, model=0): 200, ProcessCoord(pipe=50, data=1, model=0): 201, ProcessCoord(pipe=50, data=2, model=0): 202, ProcessCoord(pipe=50, data=3, model=0): 203, ProcessCoord(pipe=51, data=0, model=0): 204, ProcessCoord(pipe=51, data=1, model=0): 205, ProcessCoord(pipe=51, data=2, model=0): 206, ProcessCoord(pipe=51, data=3, model=0): 207, ProcessCoord(pipe=52, data=0, model=0): 208, ProcessCoord(pipe=52, data=1, model=0): 209, ProcessCoord(pipe=52, data=2, model=0): 210, ProcessCoord(pipe=52, data=3, model=0): 211, ProcessCoord(pipe=53, data=0, model=0): 212, ProcessCoord(pipe=53, data=1, model=0): 213, ProcessCoord(pipe=53, data=2, model=0): 214, ProcessCoord(pipe=53, data=3, model=0): 215, ProcessCoord(pipe=54, data=0, model=0): 216, ProcessCoord(pipe=54, data=1, model=0): 217, ProcessCoord(pipe=54, data=2, model=0): 218, ProcessCoord(pipe=54, data=3, model=0): 219, ProcessCoord(pipe=55, data=0, model=0): 220, ProcessCoord(pipe=55, data=1, model=0): 221, ProcessCoord(pipe=55, data=2, model=0): 222, ProcessCoord(pipe=55, data=3, model=0): 223, ProcessCoord(pipe=56, data=0, model=0): 224, ProcessCoord(pipe=56, data=1, model=0): 225, ProcessCoord(pipe=56, data=2, model=0): 226, ProcessCoord(pipe=56, data=3, model=0): 227, ProcessCoord(pipe=57, data=0, model=0): 228, ProcessCoord(pipe=57, data=1, model=0): 229, ProcessCoord(pipe=57, data=2, model=0): 230, ProcessCoord(pipe=57, data=3, model=0): 231, ProcessCoord(pipe=58, data=0, model=0): 232, ProcessCoord(pipe=58, data=1, model=0): 233, ProcessCoord(pipe=58, data=2, model=0): 234, ProcessCoord(pipe=58, data=3, model=0): 235, ProcessCoord(pipe=59, data=0, model=0): 236, ProcessCoord(pipe=59, data=1, model=0): 237, ProcessCoord(pipe=59, data=2, model=0): 238, ProcessCoord(pipe=59, data=3, model=0): 239, ProcessCoord(pipe=60, data=0, model=0): 240, ProcessCoord(pipe=60, data=1, model=0): 241, ProcessCoord(pipe=60, data=2, model=0): 242, ProcessCoord(pipe=60, data=3, model=0): 243, ProcessCoord(pipe=61, data=0, model=0): 244, ProcessCoord(pipe=61, data=1, model=0): 245, ProcessCoord(pipe=61, data=2, model=0): 246, ProcessCoord(pipe=61, data=3, model=0): 247, ProcessCoord(pipe=62, data=0, model=0): 248, ProcessCoord(pipe=62, data=1, model=0): 249, ProcessCoord(pipe=62, data=2, model=0): 250, ProcessCoord(pipe=62, data=3, model=0): 251, ProcessCoord(pipe=63, data=0, model=0): 252, ProcessCoord(pipe=63, data=1, model=0): 253, ProcessCoord(pipe=63, data=2, model=0): 254, ProcessCoord(pipe=63, data=3, model=0): 255, ProcessCoord(pipe=64, data=0, model=0): 256, ProcessCoord(pipe=64, data=1, model=0): 257, ProcessCoord(pipe=64, data=2, model=0): 258, ProcessCoord(pipe=64, data=3, model=0): 259, ProcessCoord(pipe=65, data=0, model=0): 260, ProcessCoord(pipe=65, data=1, model=0): 261, ProcessCoord(pipe=65, data=2, model=0): 262, ProcessCoord(pipe=65, data=3, model=0): 263, ProcessCoord(pipe=66, data=0, model=0): 264, ProcessCoord(pipe=66, data=1, model=0): 265, ProcessCoord(pipe=66, data=2, model=0): 266, ProcessCoord(pipe=66, data=3, model=0): 267, ProcessCoord(pipe=67, data=0, model=0): 268, ProcessCoord(pipe=67, data=1, model=0): 269, ProcessCoord(pipe=67, data=2, model=0): 270, ProcessCoord(pipe=67, data=3, model=0): 271, ProcessCoord(pipe=68, data=0, model=0): 272, ProcessCoord(pipe=68, data=1, model=0): 273, ProcessCoord(pipe=68, data=2, model=0): 274, ProcessCoord(pipe=68, data=3, model=0): 275, ProcessCoord(pipe=69, data=0, model=0): 276, ProcessCoord(pipe=69, data=1, model=0): 277, ProcessCoord(pipe=69, data=2, model=0): 278, ProcessCoord(pipe=69, data=3, model=0): 279, ProcessCoord(pipe=70, data=0, model=0): 280, ProcessCoord(pipe=70, data=1, model=0): 281, ProcessCoord(pipe=70, data=2, model=0): 282, ProcessCoord(pipe=70, data=3, model=0): 283, ProcessCoord(pipe=71, data=0, model=0): 284, ProcessCoord(pipe=71, data=1, model=0): 285, ProcessCoord(pipe=71, data=2, model=0): 286, ProcessCoord(pipe=71, data=3, model=0): 287} [default0]:[2022-09-03 19:58:25,259] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer|embedding [default0]:stage=0 layers=3 [default0]: 0: _to_float16 [default0]: 1: EmbeddingPipe [default0]: 2: [default0]:stage=1 layers=1 [default0]: 3: ParallelTransformerLayerPipe [default0]:stage=2 layers=1 [default0]: 4: ParallelTransformerLayerPipe [default0]:stage=3 layers=1 [default0]: 5: ParallelTransformerLayerPipe [default0]:stage=4 layers=1 [default0]: 6: ParallelTransformerLayerPipe [default0]:stage=5 layers=1 [default0]: 7: ParallelTransformerLayerPipe [default0]:stage=6 layers=1 [default0]: 8: ParallelTransformerLayerPipe [default0]:stage=7 layers=1 [default0]: 9: ParallelTransformerLayerPipe [default0]:stage=8 layers=1 [default0]: 10: ParallelTransformerLayerPipe [default0]:stage=9 layers=1 [default0]: 11: ParallelTransformerLayerPipe [default0]:stage=10 layers=1 [default0]: 12: ParallelTransformerLayerPipe [default0]:stage=11 layers=1 [default0]: 13: ParallelTransformerLayerPipe [default0]:stage=12 layers=1 [default0]: 14: ParallelTransformerLayerPipe [default0]:stage=13 layers=1 [default0]: 15: ParallelTransformerLayerPipe [default0]:stage=14 layers=1 [default0]: 16: ParallelTransformerLayerPipe [default0]:stage=15 layers=1 [default0]: 17: ParallelTransformerLayerPipe [default0]:stage=16 layers=1 [default0]: 18: ParallelTransformerLayerPipe [default0]:stage=17 layers=1 [default0]: 19: ParallelTransformerLayerPipe [default0]:stage=18 layers=1 [default0]: 20: ParallelTransformerLayerPipe [default0]:stage=19 layers=1 [default0]: 21: ParallelTransformerLayerPipe [default0]:stage=20 layers=1 [default0]: 22: ParallelTransformerLayerPipe [default0]:stage=21 layers=1 [default0]: 23: ParallelTransformerLayerPipe [default0]:stage=22 layers=1 [default0]: 24: ParallelTransformerLayerPipe [default0]:stage=23 layers=1 [default0]: 25: ParallelTransformerLayerPipe [default0]:stage=24 layers=1 [default0]: 26: ParallelTransformerLayerPipe [default0]:stage=25 layers=1 [default0]: 27: ParallelTransformerLayerPipe [default0]:stage=26 layers=1 [default0]: 28: ParallelTransformerLayerPipe [default0]:stage=27 layers=1 [default0]: 29: ParallelTransformerLayerPipe [default0]:stage=28 layers=1 [default0]: 30: ParallelTransformerLayerPipe [default0]:stage=29 layers=1 [default0]: 31: ParallelTransformerLayerPipe [default0]:stage=30 layers=1 [default0]: 32: ParallelTransformerLayerPipe [default0]:stage=31 layers=1 [default0]: 33: ParallelTransformerLayerPipe [default0]:stage=32 layers=1 [default0]: 34: ParallelTransformerLayerPipe [default0]:stage=33 layers=1 [default0]: 35: ParallelTransformerLayerPipe [default0]:stage=34 layers=1 [default0]: 36: ParallelTransformerLayerPipe [default0]:stage=35 layers=1 [default0]: 37: ParallelTransformerLayerPipe [default0]:stage=36 layers=1 [default0]: 38: ParallelTransformerLayerPipe [default0]:stage=37 layers=1 [default0]: 39: ParallelTransformerLayerPipe [default0]:stage=38 layers=1 [default0]: 40: ParallelTransformerLayerPipe [default0]:stage=39 layers=1 [default0]: 41: ParallelTransformerLayerPipe [default0]:stage=40 layers=1 [default0]: 42: ParallelTransformerLayerPipe [default0]:stage=41 layers=1 [default0]: 43: ParallelTransformerLayerPipe [default0]:stage=42 layers=1 [default0]: 44: ParallelTransformerLayerPipe [default0]:stage=43 layers=1 [default0]: 45: ParallelTransformerLayerPipe [default0]:stage=44 layers=1 [default0]: 46: ParallelTransformerLayerPipe [default0]:stage=45 layers=1 [default0]: 47: ParallelTransformerLayerPipe [default0]:stage=46 layers=1 [default0]: 48: ParallelTransformerLayerPipe [default0]:stage=47 layers=1 [default0]: 49: ParallelTransformerLayerPipe [default0]:stage=48 layers=1 [default0]: 50: ParallelTransformerLayerPipe [default0]:stage=49 layers=1 [default0]: 51: ParallelTransformerLayerPipe [default0]:stage=50 layers=1 [default0]: 52: ParallelTransformerLayerPipe [default0]:stage=51 layers=1 [default0]: 53: ParallelTransformerLayerPipe [default0]:stage=52 layers=1 [default0]: 54: ParallelTransformerLayerPipe [default0]:stage=53 layers=1 [default0]: 55: ParallelTransformerLayerPipe [default0]:stage=54 layers=1 [default0]: 56: ParallelTransformerLayerPipe [default0]:stage=55 layers=1 [default0]: 57: ParallelTransformerLayerPipe [default0]:stage=56 layers=1 [default0]: 58: ParallelTransformerLayerPipe [default0]:stage=57 layers=1 [default0]: 59: ParallelTransformerLayerPipe [default0]:stage=58 layers=1 [default0]: 60: ParallelTransformerLayerPipe [default0]:stage=59 layers=1 [default0]: 61: ParallelTransformerLayerPipe [default0]:stage=60 layers=1 [default0]: 62: ParallelTransformerLayerPipe [default0]:stage=61 layers=1 [default0]: 63: ParallelTransformerLayerPipe [default0]:stage=62 layers=1 [default0]: 64: ParallelTransformerLayerPipe [default0]:stage=63 layers=1 [default0]: 65: ParallelTransformerLayerPipe [default0]:stage=64 layers=1 [default0]: 66: ParallelTransformerLayerPipe [default0]:stage=65 layers=1 [default0]: 67: ParallelTransformerLayerPipe [default0]:stage=66 layers=1 [default0]: 68: ParallelTransformerLayerPipe [default0]:stage=67 layers=1 [default0]: 69: ParallelTransformerLayerPipe [default0]:stage=68 layers=1 [default0]: 70: ParallelTransformerLayerPipe [default0]:stage=69 layers=1 [default0]: 71: ParallelTransformerLayerPipe [default0]:stage=70 layers=3 [default0]: 72: ParallelTransformerLayerPipe [default0]: 73: undo [default0]: 74: MixedFusedLayerNorm [default0]:stage=71 layers=2 [default0]: 75: EmbeddingPipe [default0]: 76: float16_to_fp32 [default0]: loss: CrossEntropy [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default5]:Building extension module utils... [default5]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.40546751022338867 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.40394091606140137 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.40493011474609375 seconds [default5]:ninja: no work to do. [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.40519118309020996 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.01239323616027832 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.01210927963256836 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.012318134307861328 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.01190805435180664 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-03 19:58:26,991] [INFO] [utils.py:827:see_memory_usage] After Building Model [default0]:[2022-09-03 19:58:26,991] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 19:58:26,992] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.49 GB, percent = 7.3% [default0]:setting training iterations to 3100 [default0]:> learning rate decay style: constant [default0]:DeepSpeed is enabled. [default0]:[2022-09-03 19:58:26,992] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.1+8b2a6371, git-hash=8b2a6371, git-branch=master [default2]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default2]:Building extension module utils... [default2]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:Loading extension module utils... [default1]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.41774487495422363 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.34836673736572266 seconds [default2]:Loading extension module utils... [default3]:Loading extension module utils... [default1]:Loading extension module utils... [default7]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3836836814880371 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3837435245513916 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.34850072860717773 seconds [default2]:ninja: no work to do. [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3857100009918213 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3470940589904785 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3475313186645508 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.31682801246643066 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.38366198539733887 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.31745243072509766 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3184950351715088 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3171849250793457 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3840622901916504 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.41346168518066406 seconds [default1]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.41796231269836426 seconds [default1]:Time to load utils op: 0.41794776916503906 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.4134399890899658 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.41345858573913574 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3479442596435547 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.4134688377380371 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3486502170562744 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.34783029556274414 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.4179348945617676 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3513479232788086 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3477919101715088 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3029756546020508 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3506128787994385 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3538188934326172 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3538188934326172 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30262231826782227 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.41797375679016113 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.350522518157959 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3508763313293457 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3027210235595703 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3026001453399658 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3679006099700928 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.36818528175354004 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3679077625274658 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.42241716384887695 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.35192418098449707 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.4218709468841553 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.4218721389770508 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3538248538970947 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3156399726867676 seconds [default3]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3105607032775879 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.31694650650024414 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3107125759124756 seconds [default3]:Time to load utils op: 0.3156697750091553 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.31145286560058594 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.31156182289123535 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.35381436347961426 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.311384916305542 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3113536834716797 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3115248680114746 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3025238513946533 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3067598342895508 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.306812047958374 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.306063175201416 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30663347244262695 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3063185214996338 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.42191386222839355 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3025016784667969 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30266809463500977 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30266523361206055 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3027660846710205 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3028240203857422 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3027796745300293 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3027653694152832 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.31113767623901367 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.30753254890441895 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.302478551864624 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30265092849731445 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3052983283996582 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3730626106262207 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30275678634643555 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30274486541748047 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3053090572357178 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3052501678466797 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3053169250488281 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30281615257263184 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.35228610038757324 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.35134339332580566 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.35161542892456055 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.31446170806884766 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3025951385498047 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.316068172454834 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3142673969268799 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3143124580383301 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.31502461433410645 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3026406764984131 seconds [default5]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.32021164894104004 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30695033073425293 seconds [default5]:Time to load utils op: 0.30686092376708984 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3263280391693115 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3067514896392822 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.32038331031799316 seconds [default0]:Loading extension module utils... [default1]:Loading extension module utils... [default0]:Time to load utils op: 0.3101310729980469 seconds [default1]:Time to load utils op: 0.30928730964660645 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.31348419189453125 seconds [default2]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3206362724304199 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3135387897491455 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3246626853942871 seconds [default0]:Loading extension module utils... [default2]:Time to load utils op: 0.3134305477142334 seconds [default0]:Time to load utils op: 0.3129606246948242 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30268192291259766 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30260419845581055 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3216991424560547 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3026742935180664 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3216207027435303 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30271410942077637 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3223590850830078 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.32160234451293945 seconds [default6]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3261580467224121 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.36792445182800293 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.32576632499694824 seconds [default6]:Time to load utils op: 0.32585763931274414 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.31545543670654297 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30756402015686035 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3932666778564453 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.30760788917541504 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3922901153564453 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30760955810546875 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3025624752044678 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30274510383605957 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3154623508453369 seconds [default2]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.39198827743530273 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3732173442840576 seconds [default2]:Time to load utils op: 0.37331414222717285 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3288393020629883 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.360867977142334 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3301827907562256 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.39304351806640625 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.36294126510620117 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3626248836517334 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.31311631202697754 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.4046173095703125 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3362443447113037 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3156087398529053 seconds [default2]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30276918411254883 seconds [default2]:Time to load utils op: 0.3026454448699951 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3205718994140625 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3085775375366211 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.40525102615356445 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3367764949798584 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.32468485832214355 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.32468223571777344 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.38143467903137207 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.32469940185546875 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.40518689155578613 seconds [default2]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3629481792449951 seconds [default2]:Time to load utils op: 0.30855894088745117 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3371741771697998 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.40508008003234863 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3368241786956787 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3155965805053711 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.38145899772644043 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.34397125244140625 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.40430665016174316 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.40467143058776855 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3931601047515869 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.36266589164733887 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3046762943267822 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3051886558532715 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.34428906440734863 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3290371894836426 seconds [default6]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.32985997200012207 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3042576313018799 seconds [default6]:Time to load utils op: 0.304546594619751 seconds [default3]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3596503734588623 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.35965919494628906 seconds [default3]:Time to load utils op: 0.3229532241821289 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.40467214584350586 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.40440797805786133 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.34160709381103516 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3930497169494629 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3161661624908447 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30269718170166016 seconds [default4]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.381272554397583 seconds [default4]:Time to load utils op: 0.3192329406738281 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3183784484863281 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.38097167015075684 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3207089900970459 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3199000358581543 seconds [default0]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3017921447753906 seconds [default0]:Time to load utils op: 0.30267763137817383 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3185000419616699 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.32959485054016113 seconds [default3]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.39306044578552246 seconds [default3]:Time to load utils op: 0.39304018020629883 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3294949531555176 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.33065009117126465 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.33025383949279785 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.302814245223999 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3065376281738281 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3025987148284912 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3067011833190918 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30682802200317383 seconds [default1]:Time to load utils op: 0.41755056381225586 seconds [default0]:Time to load utils op: 0.4176042079925537 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.34383201599121094 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.34581565856933594 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.31595873832702637 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.316148042678833 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3159637451171875 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3242490291595459 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3409585952758789 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3164401054382324 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.31630825996398926 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.31598639488220215 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3163166046142578 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.31616950035095215 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30272722244262695 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30276918411254883 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.31647372245788574 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30279040336608887 seconds [default0]:Loading extension module utils... [default6]:Loading extension module utils... [default0]:Time to load utils op: 0.303051233291626 seconds [default6]:Time to load utils op: 0.3409440517425537 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.31645965576171875 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.31278395652770996 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.34113240242004395 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3461644649505615 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.34614062309265137 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.31876087188720703 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3023092746734619 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30265116691589355 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30259227752685547 seconds [default7]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3234593868255615 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.31221795082092285 seconds [default7]:Time to load utils op: 0.3126797676086426 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.302581787109375 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.34582018852233887 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3438096046447754 seconds [default6]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3156418800354004 seconds [default6]:Time to load utils op: 0.31563711166381836 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3152594566345215 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.31650805473327637 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3246753215789795 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3245837688446045 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.32466983795166016 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3730478286743164 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3053414821624756 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3026273250579834 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.32466769218444824 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30685997009277344 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30632948875427246 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.30538177490234375 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30559682846069336 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3065073490142822 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.32009267807006836 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30242133140563965 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.32009339332580566 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.36086368560791016 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.7395608425140381 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.36084556579589844 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3608417510986328 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.31644749641418457 seconds [default2]:Time to load utils op: 0.7395613193511963 seconds [default5]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30284953117370605 seconds [default5]:Time to load utils op: 0.3065185546875 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3068056106567383 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30662965774536133 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.32383227348327637 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.30634164810180664 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3594856262207031 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3068397045135498 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3148689270019531 seconds [default3]:Time to load utils op: 0.739558219909668 seconds [default1]:Time to load utils op: 0.7395799160003662 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3144509792327881 seconds [default7]:Time to load utils op: 0.3598287105560303 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3141899108886719 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004830360412597656 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30632662773132324 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30536913871765137 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3027181625366211 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.31438279151916504 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006644725799560547 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0009794235229492188 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008153915405273438 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006878376007080078 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004546642303466797 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005290508270263672 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006449222564697266 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006487369537353516 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007054805755615234 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006892681121826172 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005855560302734375 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.000997781753540039 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004696846008300781 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008254051208496094 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Time to load utils op: 0.0006532669067382812 seconds [default6]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Time to load utils op: 0.0005810260772705078 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004954338073730469 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default2]:Loading extension module utils... [default4]:Time to load utils op: 0.0004715919494628906 seconds [default2]:Time to load utils op: 0.0004353523254394531 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006582736968994141 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006475448608398438 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006215572357177734 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0017242431640625 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007266998291015625 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006890296936035156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007386207580566406 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006220340728759766 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006692409515380859 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008921623229980469 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006313323974609375 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0017817020416259766 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00044035911560058594 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006096363067626953 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005013942718505859 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006988048553466797 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005729198455810547 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005865097045898438 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008287429809570312 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006973743438720703 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006811618804931641 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006787776947021484 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006477832794189453 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005652904510498047 seconds [default5]:Time to load utils op: 0.0006189346313476562 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008611679077148438 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006601810455322266 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Time to load utils op: 0.0005879402160644531 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005962848663330078 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005700588226318359 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006244182586669922 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0003426074981689453 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006418228149414062 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004010200500488281 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.000606536865234375 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006966590881347656 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006699562072753906 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00042819976806640625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004436969757080078 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007562637329101562 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007071495056152344 seconds [default3]:Time to load utils op: 0.0008199214935302734 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007789134979248047 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004932880401611328 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005843639373779297 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0011878013610839844 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006823539733886719 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005621910095214844 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005812644958496094 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007214546203613281 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Time to load utils op: 0.000701904296875 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00046539306640625 seconds [default2]:Time to load utils op: 0.001192331314086914 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008220672607421875 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005571842193603516 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005202293395996094 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0009181499481201172 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007510185241699219 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006844997406005859 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005602836608886719 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006971359252929688 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default1]:Loading extension module utils... [default2]:Time to load utils op: 0.0006053447723388672 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006799697875976562 seconds [default1]:Time to load utils op: 0.0006563663482666016 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006003379821777344 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005385875701904297 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005171298980712891 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006144046783447266 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005598068237304688 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005064010620117188 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006794929504394531 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006043910980224609 seconds [default4]:Time to load utils op: 0.0006771087646484375 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005795955657958984 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004956722259521484 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005502700805664062 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default6]:Time to load utils op: 0.0005776882171630859 seconds [default2]:Time to load utils op: 0.0006330013275146484 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006978511810302734 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006816387176513672 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007576942443847656 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005254745483398438 seconds [default3]:Time to load utils op: 0.0007097721099853516 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00077056884765625 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0009179115295410156 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007832050323486328 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007040500640869141 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005750656127929688 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005207061767578125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006883144378662109 seconds [default6]:Time to load utils op: 0.0007183551788330078 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005726814270019531 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007002353668212891 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007979869842529297 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005617141723632812 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007600784301757812 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005681514739990234 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006899833679199219 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004661083221435547 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0009694099426269531 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008478164672851562 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005145072937011719 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007121562957763672 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0009615421295166016 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005006790161132812 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.001055002212524414 seconds [default1]:Time to load utils op: 0.00041794776916503906 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007288455963134766 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007431507110595703 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005326271057128906 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006351470947265625 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Time to load utils op: 0.0006372928619384766 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Time to load utils op: 0.0004718303680419922 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005915164947509766 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005764961242675781 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00042057037353515625 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006039142608642578 seconds [default1]:Time to load utils op: 0.0006678104400634766 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006725788116455078 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006477832794189453 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006659030914306641 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00035691261291503906 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0016520023345947266 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006358623504638672 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006935596466064453 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004956722259521484 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004646778106689453 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006055831909179688 seconds [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000453948974609375 seconds [default1]:Time to load utils op: 0.0004730224609375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007510185241699219 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008399486541748047 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0015027523040771484 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005197525024414062 seconds [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0003933906555175781 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00037670135498046875 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00074005126953125 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0015864372253417969 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005311965942382812 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0014796257019042969 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008950233459472656 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006783008575439453 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006139278411865234 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007166862487792969 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006971359252929688 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0015687942504882812 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0017056465148925781 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0015735626220703125 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006651878356933594 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005693435668945312 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005900859832763672 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.000507354736328125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0015652179718017578 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006575584411621094 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005893707275390625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006508827209472656 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008251667022705078 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006077289581298828 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006999969482421875 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.001283884048461914 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00046896934509277344 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0011646747589111328 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0010023117065429688 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00035190582275390625 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006422996520996094 seconds [default3]:Time to load utils op: 0.0010595321655273438 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00034880638122558594 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004961490631103516 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00052642822265625 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004951953887939453 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Time to load utils op: 0.0006008148193359375 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005519390106201172 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004930496215820312 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007793903350830078 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006146430969238281 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default1]:Time to load utils op: 0.0004839897155761719 seconds [default3]:Time to load utils op: 0.00038313865661621094 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006344318389892578 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006074905395507812 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00044798851013183594 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005502700805664062 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005526542663574219 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005609989166259766 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004904270172119141 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005862712860107422 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0003447532653808594 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004963874816894531 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005326271057128906 seconds [default2]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005335807800292969 seconds [default2]:Time to load utils op: 0.0004889965057373047 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005464553833007812 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005559921264648438 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005571842193603516 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008921623229980469 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0009074211120605469 seconds [default0]:Time to load utils op: 0.0006301403045654297 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006296634674072266 seconds [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00079345703125 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006871223449707031 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006430149078369141 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005915164947509766 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007829666137695312 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008325576782226562 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006551742553710938 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005848407745361328 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006630420684814453 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0009191036224365234 seconds [default1]:Time to load utils op: 0.0008497238159179688 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007603168487548828 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006310939788818359 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007812976837158203 seconds [default3]:Time to load utils op: 0.0006673336029052734 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005977153778076172 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007045269012451172 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006995201110839844 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006906986236572266 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default0]:Time to load utils op: 0.0008184909820556641 seconds [default1]:Time to load utils op: 0.0008513927459716797 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006799697875976562 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0008292198181152344 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008718967437744141 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007617473602294922 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006570816040039062 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006153583526611328 seconds [default5]:Time to load utils op: 0.0006108283996582031 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005900859832763672 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006921291351318359 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007684230804443359 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0010139942169189453 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008134841918945312 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0009508132934570312 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007712841033935547 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0011131763458251953 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0008940696716308594 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006101131439208984 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0009293556213378906 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006985664367675781 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008535385131835938 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006170272827148438 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008015632629394531 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005812644958496094 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006635189056396484 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005526542663574219 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005712509155273438 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005865097045898438 seconds [default2]:Time to load utils op: 0.000492095947265625 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006127357482910156 seconds [default5]:Time to load utils op: 0.0006358623504638672 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0010356903076171875 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005464553833007812 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0009303092956542969 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008244514465332031 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006203651428222656 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007042884826660156 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005247592926025391 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006163120269775391 seconds [default0]:[2022-09-03 19:58:27,715] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [default0]:[2022-09-03 19:58:27,715] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer [default0]:[2022-09-03 19:58:27,715] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer [default0]:[2022-09-03 19:58:27,715] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = {basic_optimizer.__class__.__name__} [default0]:[2022-09-03 19:58:27,715] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-03 19:58:27,738] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer [default0]:[2022-09-03 19:58:27,738] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 19:58:27,739] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.65 GB, percent = 7.3% [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default1]:Building extension module utils... [default1]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default1]:ninja: no work to do. [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.25211167335510254 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3043971061706543 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30457472801208496 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005006790161132812 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3020048141479492 seconds [default0]:[2022-09-03 19:58:28,063] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 [default0]:[2022-09-03 19:58:28,063] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-03 19:58:28,064] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.65 GB, percent = 7.3% [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30879831314086914 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.309145450592041 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3087594509124756 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.30907750129699707 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0003476142883300781 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006263256072998047 seconds [default0]:[2022-09-03 19:58:28,121] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 [default0]:[2022-09-03 19:58:28,122] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 19:58:28,122] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.65 GB, percent = 7.3% [default0]:[2022-09-03 19:58:28,142] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 [default0]:[2022-09-03 19:58:28,142] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 19:58:28,142] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.65 GB, percent = 7.3% [default0]:[2022-09-03 19:58:28,162] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 [default0]:[2022-09-03 19:58:28,163] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 19:58:28,163] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.65 GB, percent = 7.3% [default0]:[2022-09-03 19:58:28,193] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer [default0]:[2022-09-03 19:58:28,193] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-03 19:58:28,193] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.65 GB, percent = 7.3% [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0015685558319091797 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0017490386962890625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0014808177947998047 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0018198490142822266 seconds [default0]:[2022-09-03 19:58:28,251] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer [default0]:[2022-09-03 19:58:28,251] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-03 19:58:28,251] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.65 GB, percent = 7.3% [default0]:[2022-09-03 19:58:28,272] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer [default0]:[2022-09-03 19:58:28,272] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-03 19:58:28,273] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.65 GB, percent = 7.3% [default0]:[2022-09-03 19:58:28,273] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [default0]:[2022-09-03 19:58:28,273] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler [default0]:[2022-09-03 19:58:28,273] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [default0]:[2022-09-03 19:58:28,273] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05, 2e-05], mom=[(0.9, 0.95), (0.9, 0.95)] [default0]:[2022-09-03 19:58:28,273] [INFO] [config.py:987:print] DeepSpeedEngine configuration: [default0]:[2022-09-03 19:58:28,273] [INFO] [config.py:991:print] activation_checkpointing_config { [default0]: "partition_activations": false, [default0]: "contiguous_memory_optimization": false, [default0]: "cpu_checkpointing": false, [default0]: "number_checkpoints": null, [default0]: "synchronize_checkpoint_boundary": false, [default0]: "profile": false [default0]:} [default0]:[2022-09-03 19:58:28,273] [INFO] [config.py:991:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [default0]:[2022-09-03 19:58:28,273] [INFO] [config.py:991:print] amp_enabled .................. False [default0]:[2022-09-03 19:58:28,273] [INFO] [config.py:991:print] amp_params ................... False [default0]:[2022-09-03 19:58:28,273] [INFO] [config.py:991:print] autotuning_config ............ { [default0]: "enabled": false, [default0]: "start_step": null, [default0]: "end_step": null, [default0]: "metric_path": null, [default0]: "arg_mappings": null, [default0]: "metric": "throughput", [default0]: "model_info": null, [default0]: "results_dir": null, [default0]: "exps_dir": null, [default0]: "overwrite": true, [default0]: "fast": true, [default0]: "start_profile_step": 3, [default0]: "end_profile_step": 5, [default0]: "tuner_type": "gridsearch", [default0]: "tuner_early_stopping": 5, [default0]: "tuner_num_trials": 50, [default0]: "model_info_path": null, [default0]: "mp_size": 1, [default0]: "max_train_batch_size": null, [default0]: "min_train_batch_size": 1, [default0]: "max_train_micro_batch_size_per_gpu": 1.024000e+03, [default0]: "min_train_micro_batch_size_per_gpu": 1, [default0]: "num_tuning_micro_batch_sizes": 3 [default0]:} [default0]:[2022-09-03 19:58:28,273] [INFO] [config.py:991:print] bfloat16_enabled ............. True [default0]:[2022-09-03 19:58:28,273] [INFO] [config.py:991:print] checkpoint_tag_validation_enabled True [default0]:[2022-09-03 19:58:28,273] [INFO] [config.py:991:print] checkpoint_tag_validation_fail False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] comms_config ................. [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] communication_data_type ...... None [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] curriculum_enabled ........... False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] curriculum_params ............ False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] dataloader_drop_last ......... False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] disable_allgather ............ False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] dump_state ................... False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] dynamic_loss_scale_args ...... None [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] eigenvalue_enabled ........... False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] eigenvalue_gas_boundary_resolution 1 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] eigenvalue_layer_name ........ bert.encoder.layer [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] eigenvalue_layer_num ......... 0 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] eigenvalue_max_iter .......... 100 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] eigenvalue_stability ......... 1e-06 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] eigenvalue_tol ............... 0.01 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] eigenvalue_verbose ........... False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] elasticity_enabled ........... False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] flops_profiler_config ........ { [default0]: "enabled": false, [default0]: "profile_step": 1, [default0]: "module_depth": -1, [default0]: "top_modules": 1, [default0]: "detailed": true, [default0]: "output_file": null [default0]:} [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] fp16_auto_cast ............... None [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] fp16_enabled ................. False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] fp16_master_weights_and_gradients False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] global_rank .................. 0 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] gradient_accumulation_steps .. 512 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] gradient_clipping ............ 1.0 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] gradient_predivide_factor .... 1.0 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] initial_dynamic_scale ........ 1 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] load_universal_checkpoint .... False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] loss_scale ................... 1.0 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] memory_breakdown ............. False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] monitor_config ............... [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] nebula_config ................ { [default0]: "enabled": false, [default0]: "persistent_storage_path": null, [default0]: "persistent_time_interval": 100, [default0]: "num_of_version_in_retention": 2, [default0]: "enable_nebula_load": true, [default0]: "load_path": null [default0]:} [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] optimizer_legacy_fusion ...... False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] optimizer_name ............... None [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] optimizer_params ............. None [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] pld_enabled .................. False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] pld_params ................... False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] prescale_gradients ........... False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] scheduler_name ............... None [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] scheduler_params ............. None [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] sparse_attention ............. None [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] sparse_gradients_enabled ..... False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] steps_per_print .............. 2000 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] train_batch_size ............. 2048 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] train_micro_batch_size_per_gpu 1 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] wall_clock_breakdown ......... False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] world_size ................... 4 [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] zero_allow_untested_optimizer False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] zero_enabled ................. False [default0]:[2022-09-03 19:58:28,274] [INFO] [config.py:991:print] zero_optimization_stage ...... 0 [default0]:[2022-09-03 19:58:28,275] [INFO] [config.py:976:print_user_config] json = { [default0]: "train_micro_batch_size_per_gpu": 1, [default0]: "train_batch_size": 2.048000e+03, [default0]: "gradient_clipping": 1.0, [default0]: "zero_optimization": { [default0]: "stage": 0 [default0]: }, [default0]: "bf16": { [default0]: "enabled": true [default0]: }, [default0]: "steps_per_print": 2.000000e+03, [default0]: "wall_clock_breakdown": false [default0]:} [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005135536193847656 seconds [default0]:[2022-09-03 19:58:28,275] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=512 micro_batch_size=1 [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=24 STAGE=6 LAYERS=1 [8, 9) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=88 STAGE=22 LAYERS=1 [24, 25) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=28 STAGE=7 LAYERS=1 [9, 10) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=92 STAGE=23 LAYERS=1 [25, 26) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=248 STAGE=62 LAYERS=1 [64, 65) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=80 STAGE=20 LAYERS=1 [22, 23) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=120 STAGE=30 LAYERS=1 [32, 33) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=108 STAGE=27 LAYERS=1 [29, 30) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=104 STAGE=26 LAYERS=1 [28, 29) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=84 STAGE=21 LAYERS=1 [23, 24) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=112 STAGE=28 LAYERS=1 [30, 31) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=252 STAGE=63 LAYERS=1 [65, 66) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=40 STAGE=10 LAYERS=1 [12, 13) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=212 STAGE=53 LAYERS=1 [55, 56) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=208 STAGE=52 LAYERS=1 [54, 55) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=96 STAGE=24 LAYERS=1 [26, 27) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=48 STAGE=12 LAYERS=1 [14, 15) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=216 STAGE=54 LAYERS=1 [56, 57) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=172 STAGE=43 LAYERS=1 [45, 46) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=168 STAGE=42 LAYERS=1 [44, 45) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=136 STAGE=34 LAYERS=1 [36, 37) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=52 STAGE=13 LAYERS=1 [15, 16) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=116 STAGE=29 LAYERS=1 [31, 32) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=176 STAGE=44 LAYERS=1 [46, 47) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=180 STAGE=45 LAYERS=1 [47, 48) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=124 STAGE=31 LAYERS=1 [33, 34) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=272 STAGE=68 LAYERS=1 [70, 71) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=152 STAGE=38 LAYERS=1 [40, 41) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=220 STAGE=55 LAYERS=1 [57, 58) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=3 [0, 3) STAGE_PARAMS=3596644352 (3596.644M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=100 STAGE=25 LAYERS=1 [27, 28) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=192 STAGE=48 LAYERS=1 [50, 51) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=196 STAGE=49 LAYERS=1 [51, 52) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=240 STAGE=60 LAYERS=1 [62, 63) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=8 STAGE=2 LAYERS=1 [4, 5) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=156 STAGE=39 LAYERS=1 [41, 42) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=224 STAGE=56 LAYERS=1 [58, 59) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=72 STAGE=18 LAYERS=1 [20, 21) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=68 STAGE=17 LAYERS=1 [19, 20) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=64 STAGE=16 LAYERS=1 [18, 19) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=276 STAGE=69 LAYERS=1 [71, 72) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=44 STAGE=11 LAYERS=1 [13, 14) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=12 STAGE=3 LAYERS=1 [5, 6) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=160 STAGE=40 LAYERS=1 [42, 43) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=132 STAGE=33 LAYERS=1 [35, 36) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=164 STAGE=41 LAYERS=1 [43, 44) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=144 STAGE=36 LAYERS=1 [38, 39) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=184 STAGE=46 LAYERS=1 [48, 49) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=76 STAGE=19 LAYERS=1 [21, 22) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=244 STAGE=61 LAYERS=1 [63, 64) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=16 STAGE=4 LAYERS=1 [6, 7) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=228 STAGE=57 LAYERS=1 [59, 60) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=256 STAGE=64 LAYERS=1 [66, 67) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=188 STAGE=47 LAYERS=1 [49, 50) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=232 STAGE=58 LAYERS=1 [60, 61) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=128 STAGE=32 LAYERS=1 [34, 35) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=148 STAGE=37 LAYERS=1 [39, 40) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=32 STAGE=8 LAYERS=1 [10, 11) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=200 STAGE=50 LAYERS=1 [52, 53) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=56 STAGE=14 LAYERS=1 [16, 17) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=140 STAGE=35 LAYERS=1 [37, 38) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=36 STAGE=9 LAYERS=1 [11, 12) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=20 STAGE=5 LAYERS=1 [7, 8) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=280 STAGE=70 LAYERS=3 [72, 75) STAGE_PARAMS=2466465792 (2466.466M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=60 STAGE=15 LAYERS=1 [17, 18) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=236 STAGE=59 LAYERS=1 [61, 62) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=204 STAGE=51 LAYERS=1 [53, 54) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=284 STAGE=71 LAYERS=2 [75, 77) STAGE_PARAMS=3596615680 (3596.616M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=264 STAGE=66 LAYERS=1 [68, 69) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=4 STAGE=1 LAYERS=1 [3, 4) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=260 STAGE=65 LAYERS=1 [67, 68) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-03 19:58:28,880] [INFO] [engine.py:145:__init__] RANK=268 STAGE=67 LAYERS=1 [69, 70) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default7]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_06_model_states.pt... [default0]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_06_model_states.pt. [default0]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_22_model_states.pt... [default1]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_22_model_states.pt... [default0]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_22_model_states.pt. [default0]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_22_model_states.pt. [default1]:[2022-09-03 19:58:29,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_07_model_states.pt... [default5]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_07_model_states.pt. [default5]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_06_model_states.pt... [default3]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_06_model_states.pt. [default3]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_06_model_states.pt... [default1]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_06_model_states.pt. [default1]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_06_model_states.pt... [default2]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_06_model_states.pt. [default2]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_07_model_states.pt... [default4]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_07_model_states.pt. [default4]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_07_model_states.pt... [default6]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_07_model_states.pt. [default6]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_07_model_states.pt... [default7]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_07_model_states.pt. [default7]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_22_model_states.pt... [default3]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_22_model_states.pt. [default3]:[2022-09-03 19:58:29,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_23_model_states.pt... [default4]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_23_model_states.pt. [default4]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_62_model_states.pt... [default0]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_62_model_states.pt. [default0]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_23_model_states.pt... [default6]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_23_model_states.pt. [default6]:[2022-09-03 19:58:29,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_62_model_states.pt... [default2]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_62_model_states.pt. [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_20_model_states.pt... [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_20_model_states.pt... [default0]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_20_model_states.pt. [default0]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_20_model_states.pt. [default1]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_22_model_states.pt... [default2]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_22_model_states.pt. [default2]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_23_model_states.pt... [default7]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_23_model_states.pt. [default7]:[2022-09-03 19:58:29,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_23_model_states.pt... [default5]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_23_model_states.pt. [default5]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_62_model_states.pt... [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_62_model_states.pt... [default1]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_62_model_states.pt. [default1]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_62_model_states.pt. [default3]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_30_model_states.pt... [default1]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_30_model_states.pt. [default1]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_30_model_states.pt... [default2]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_30_model_states.pt. [default2]:[2022-09-03 19:58:29,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_30_model_states.pt... [default0]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_30_model_states.pt. [default0]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_20_model_states.pt... [default3]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_20_model_states.pt. [default3]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_26_model_states.pt... [default1]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_26_model_states.pt. [default1]:[2022-09-03 19:58:29,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_27_model_states.pt... [default4]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_27_model_states.pt. [default4]:[2022-09-03 19:58:29,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_21_model_states.pt... [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_21_model_states.pt. [default6]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_26_model_states.pt... [default0]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_26_model_states.pt. [default0]:[2022-09-03 19:58:29,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_26_model_states.pt... [default2]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_26_model_states.pt. [default2]:[2022-09-03 19:58:29,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_21_model_states.pt... [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_21_model_states.pt. [default6]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_63_model_states.pt... [default6]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_63_model_states.pt. [default7]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_21_model_states.pt... [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_21_model_states.pt. [default6]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_28_model_states.pt... [default0]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_28_model_states.pt... [default0]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_28_model_states.pt. [default0]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_28_model_states.pt... [default3]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_28_model_states.pt. [default3]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_28_model_states.pt. [default2]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_20_model_states.pt... [default2]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_20_model_states.pt. [default2]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_63_model_states.pt... [default4]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_63_model_states.pt. [default4]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_63_model_states.pt... [default7]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_63_model_states.pt. [default7]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_63_model_states.pt... [default5]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_63_model_states.pt. [default5]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_24_model_states.pt... [default1]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_24_model_states.pt. [default1]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_26_model_states.pt... [default3]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_26_model_states.pt. [default3]:[2022-09-03 19:58:29,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_10_model_states.pt... [default0]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_10_model_states.pt. [default0]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_53_model_states.pt... [default1]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_52_model_states.pt... [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_52_model_states.pt. [default1]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_52_model_states.pt... [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_52_model_states.pt. [default3]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_52_model_states.pt... [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_52_model_states.pt. [default0]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_52_model_states.pt... [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_52_model_states.pt. [default0]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_53_model_states.pt. [default4]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_24_model_states.pt... [default0]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_24_model_states.pt. [default0]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_12_model_states.pt... [default3]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_12_model_states.pt. [default3]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_12_model_states.pt... [default2]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_12_model_states.pt. [default2]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_12_model_states.pt... [default0]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_12_model_states.pt. [default0]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_24_model_states.pt... [default2]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_24_model_states.pt. [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_43_model_states.pt... [default4]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_43_model_states.pt. [default5]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_21_model_states.pt... [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_21_model_states.pt. [default5]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_13_model_states.pt... [default0]:[2022-09-03 19:58:29,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_42_model_states.pt... [default0]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_42_model_states.pt. [default7]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_13_model_states.pt. [default7]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_42_model_states.pt... [default1]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_42_model_states.pt. [default1]:[2022-09-03 19:58:29,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_54_model_states.pt... [default0]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_54_model_states.pt. [default2]:[2022-09-03 19:58:29,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_42_model_states.pt... [default2]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_42_model_states.pt. [default0]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_42_model_states.pt... [default3]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_42_model_states.pt. [default3]:[2022-09-03 19:58:29,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_13_model_states.pt... [default6]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_13_model_states.pt. [default6]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_34_model_states.pt... [default0]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_34_model_states.pt. [default0]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_13_model_states.pt... [default1]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_12_model_states.pt... [default1]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_12_model_states.pt. [default1]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_13_model_states.pt. [default4]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_27_model_states.pt... [default5]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_27_model_states.pt. [default5]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_13_model_states.pt... [default5]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_13_model_states.pt. [default5]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_27_model_states.pt... [default7]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_27_model_states.pt. [default7]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_27_model_states.pt... [default6]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_27_model_states.pt. [default6]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_10_model_states.pt... [default1]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_10_model_states.pt. [default1]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_29_model_states.pt... [default1]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_28_model_states.pt... [default7]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_29_model_states.pt... [default6]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_29_model_states.pt. [default6]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_28_model_states.pt. [default1]:[2022-09-03 19:58:29,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_29_model_states.pt... [default4]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_29_model_states.pt. [default4]:[2022-09-03 19:58:29,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_44_model_states.pt... [default0]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_44_model_states.pt. [default0]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_45_model_states.pt... [default4]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_45_model_states.pt. [default4]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_45_model_states.pt... [default6]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_45_model_states.pt. [default6]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_53_model_states.pt... [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_53_model_states.pt. [default5]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_53_model_states.pt... [default5]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_53_model_states.pt. [default5]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_53_model_states.pt... [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_53_model_states.pt. [default7]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_31_model_states.pt... [default4]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_31_model_states.pt. [default4]:[2022-09-03 19:58:29,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_38_model_states.pt... [default1]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_68_model_states.pt... [default1]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_68_model_states.pt. [default3]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_68_model_states.pt... [default3]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_44_model_states.pt... [default3]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_44_model_states.pt. [default3]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_68_model_states.pt. [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_45_model_states.pt... [default7]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_69_model_states.pt... [default7]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_69_model_states.pt. [default5]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_45_model_states.pt. [default5]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_38_model_states.pt... [default1]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_38_model_states.pt... [default3]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_38_model_states.pt... [default2]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_54_model_states.pt... [default2]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_54_model_states.pt. [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_54_model_states.pt... [default6]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_55_model_states.pt... [default5]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_55_model_states.pt... [default4]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_55_model_states.pt. [default4]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_54_model_states.pt. [default3]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_31_model_states.pt... [default5]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_31_model_states.pt. [default5]:[2022-09-03 19:58:29,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_55_model_states.pt. [default7]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_54_model_states.pt... [default1]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_54_model_states.pt. [default1]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_30_model_states.pt... [default3]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_30_model_states.pt. [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_55_model_states.pt... [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_55_model_states.pt... [default6]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_55_model_states.pt. [default7]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_55_model_states.pt. [default7]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_25_model_states.pt... [default4]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_25_model_states.pt. [default4]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_48_model_states.pt... [default1]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_48_model_states.pt. [default7]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_31_model_states.pt... [default7]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_31_model_states.pt. [default1]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_31_model_states.pt... [default6]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_31_model_states.pt. [default6]:[2022-09-03 19:58:29,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_43_model_states.pt... [default6]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_43_model_states.pt. [default6]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_48_model_states.pt... [default0]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_48_model_states.pt. [default7]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_43_model_states.pt... [default7]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_43_model_states.pt. [default7]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_43_model_states.pt... [default5]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_43_model_states.pt. [default5]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_24_model_states.pt... [default3]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_24_model_states.pt. [default3]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_25_model_states.pt... [default5]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_25_model_states.pt. [default5]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_10_model_states.pt... [default2]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_10_model_states.pt. [default2]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_48_model_states.pt... [default2]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_48_model_states.pt. [default2]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_34_model_states.pt... [default1]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_34_model_states.pt. [default6]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_49_model_states.pt... [default6]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_49_model_states.pt. [default6]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_34_model_states.pt... [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_34_model_states.pt. [default2]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_49_model_states.pt... [default4]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_49_model_states.pt. [default4]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_60_model_states.pt... [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_60_model_states.pt. [default0]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_02_model_states.pt... [default1]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_02_model_states.pt... [default4]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_39_model_states.pt... [default4]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_39_model_states.pt. [default4]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_56_model_states.pt... [default0]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_56_model_states.pt. [default6]:[2022-09-03 19:58:29,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_25_model_states.pt... [default6]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_25_model_states.pt. [default6]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_17_model_states.pt... [default7]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_17_model_states.pt. [default7]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_16_model_states.pt... [default2]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_16_model_states.pt. [default2]:[2022-09-03 19:58:29,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_17_model_states.pt... [default6]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_17_model_states.pt. [default1]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_44_model_states.pt... [default1]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_44_model_states.pt. [default7]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_45_model_states.pt... [default1]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_44_model_states.pt... [default2]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_44_model_states.pt. [default0]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_18_model_states.pt... [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_18_model_states.pt. [default1]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_16_model_states.pt... [default1]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_16_model_states.pt. [default2]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_45_model_states.pt. [default1]:[2022-09-03 19:58:29,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_17_model_states.pt... [default5]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_17_model_states.pt. [default5]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_18_model_states.pt... [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_18_model_states.pt. [default1]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_17_model_states.pt... [default4]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_17_model_states.pt. [default3]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_16_model_states.pt... [default0]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_16_model_states.pt... [default0]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_16_model_states.pt. [default4]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_16_model_states.pt. [default3]:[2022-09-03 19:58:29,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_25_model_states.pt... [default7]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_25_model_states.pt. [default7]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_48_model_states.pt... [default3]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_48_model_states.pt. [default3]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_11_model_states.pt... [default4]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_11_model_states.pt. [default4]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_02_model_states.pt... [default2]:[2022-09-03 19:58:29,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_02_model_states.pt. [default3]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_40_model_states.pt... [default3]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_40_model_states.pt. [default3]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_02_model_states.pt... [default0]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_40_model_states.pt... [default0]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_40_model_states.pt. [default4]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_03_model_states.pt... [default0]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_03_model_states.pt. [default3]:[2022-09-03 19:58:29,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_02_model_states.pt. [default3]:[2022-09-03 19:58:29,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_49_model_states.pt... [default7]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_49_model_states.pt. [default5]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_03_model_states.pt... [default6]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_03_model_states.pt. [default6]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_03_model_states.pt... [default5]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_03_model_states.pt. [default7]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_03_model_states.pt... [default5]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_03_model_states.pt. [default5]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_49_model_states.pt... [default7]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_49_model_states.pt. [default7]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_33_model_states.pt... [default5]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_33_model_states.pt. [default5]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_32_model_states.pt... [default3]:[2022-09-03 19:58:29,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_32_model_states.pt. [default3]:[2022-09-03 19:58:29,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_33_model_states.pt... [default4]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_33_model_states.pt. [default4]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_40_model_states.pt... [default2]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_40_model_states.pt. [default2]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_11_model_states.pt... [default5]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_11_model_states.pt. [default5]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_40_model_states.pt... [default1]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_40_model_states.pt. [default1]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_41_model_states.pt... [default4]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_41_model_states.pt. [default4]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_56_model_states.pt... [default2]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_56_model_states.pt. [default2]:[2022-09-03 19:58:29,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_36_model_states.pt... [default0]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_36_model_states.pt. [default0]:[2022-09-03 19:58:29,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_61_model_states.pt... [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_61_model_states.pt. [default6]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_46_model_states.pt... [default0]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_46_model_states.pt. [default0]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_61_model_states.pt... [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_61_model_states.pt. [default4]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_18_model_states.pt... [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_18_model_states.pt. [default2]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_04_model_states.pt... [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_04_model_states.pt... [default1]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_04_model_states.pt. [default1]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_04_model_states.pt. [default0]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_18_model_states.pt... [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_18_model_states.pt. [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_19_model_states.pt... [default3]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_19_model_states.pt... [default6]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_19_model_states.pt. [default6]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_19_model_states.pt. [default7]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_19_model_states.pt... [default4]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_19_model_states.pt. [default4]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_57_model_states.pt... [default6]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_57_model_states.pt. [default6]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_57_model_states.pt... [default7]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_57_model_states.pt. [default7]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_56_model_states.pt... [default3]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_56_model_states.pt. [default3]:[2022-09-03 19:58:29,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_56_model_states.pt... [default1]:[2022-09-03 19:58:29,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_56_model_states.pt. [default1]:[2022-09-03 19:58:29,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_57_model_states.pt... [default4]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_57_model_states.pt... [default4]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_57_model_states.pt. [default5]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_57_model_states.pt. [default5]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_46_model_states.pt... [default2]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_46_model_states.pt. [default2]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_10_model_states.pt... [default3]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_10_model_states.pt. [default4]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_47_model_states.pt... [default2]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_47_model_states.pt. [default4]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_11_model_states.pt... [default7]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_11_model_states.pt. [default7]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_64_model_states.pt... [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_64_model_states.pt. [default0]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_58_model_states.pt... [default1]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_58_model_states.pt. [default2]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_64_model_states.pt... [default0]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_64_model_states.pt. [default0]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_58_model_states.pt... [default0]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_58_model_states.pt. [default0]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_58_model_states.pt... [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_58_model_states.pt. [default0]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_32_model_states.pt... [default0]:[2022-09-03 19:58:29,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_32_model_states.pt. [default1]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_50_model_states.pt... [default1]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_50_model_states.pt. [default1]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_08_model_states.pt... [default1]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_08_model_states.pt. [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_50_model_states.pt... [default2]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_50_model_states.pt. [default2]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_08_model_states.pt... [default4]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_08_model_states.pt. [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_37_model_states.pt... [default7]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_37_model_states.pt... [default2]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_08_model_states.pt... [default2]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_36_model_states.pt... [default1]:[2022-09-03 19:58:29,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_36_model_states.pt. [default2]:[2022-09-03 19:58:29,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_41_model_states.pt... [default5]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_41_model_states.pt. [default0]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_08_model_states.pt. [default0]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_37_model_states.pt. [default6]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_08_model_states.pt... [default3]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_08_model_states.pt. [default3]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_37_model_states.pt. [default6]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_39_model_states.pt... [default6]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_39_model_states.pt. [default6]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_41_model_states.pt... [default7]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_41_model_states.pt. [default6]:[2022-09-03 19:58:29,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_36_model_states.pt... [default7]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_37_model_states.pt... [default4]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_37_model_states.pt. [default1]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_36_model_states.pt. [default1]:[2022-09-03 19:58:29,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_37_model_states.pt... [default5]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_37_model_states.pt. [default5]:[2022-09-03 19:58:29,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_41_model_states.pt... [default6]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_41_model_states.pt. [default6]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_19_model_states.pt... [default5]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_19_model_states.pt. [default2]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_32_model_states.pt... [default2]:[2022-09-03 19:58:29,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_32_model_states.pt. [default0]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_50_model_states.pt... [default0]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_50_model_states.pt. [default5]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_14_model_states.pt... [default1]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_14_model_states.pt. [default6]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_47_model_states.pt... [default6]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_47_model_states.pt. [default1]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_14_model_states.pt... [default0]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_14_model_states.pt. [default3]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_46_model_states.pt... [default5]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_39_model_states.pt... [default1]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_39_model_states.pt. [default5]:[2022-09-03 19:58:29,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_46_model_states.pt... [default7]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_46_model_states.pt. [default7]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_39_model_states.pt... [default7]:[2022-09-03 19:58:29,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_39_model_states.pt. [default3]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_47_model_states.pt... [default7]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_47_model_states.pt. [default7]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_46_model_states.pt. [default1]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_47_model_states.pt... [default5]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_47_model_states.pt. [default5]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_11_model_states.pt... [default6]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_11_model_states.pt. [default6]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_35_model_states.pt... [default6]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_35_model_states.pt... [default7]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_35_model_states.pt. [default6]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_35_model_states.pt. [default7]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_35_model_states.pt... [default7]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_35_model_states.pt. [default7]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_68_model_states.pt... [default3]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_68_model_states.pt. [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_09_model_states.pt... [default5]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_09_model_states.pt. [default5]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_35_model_states.pt... [default4]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_35_model_states.pt. [default4]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_09_model_states.pt... [default4]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_09_model_states.pt. [default4]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_09_model_states.pt... [default7]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_09_model_states.pt. [default7]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_69_model_states.pt... [default4]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_69_model_states.pt. [default4]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_05_model_states.pt... [default4]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_05_model_states.pt. [default4]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_04_model_states.pt... [default3]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_04_model_states.pt. [default3]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_68_model_states.pt... [default2]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_68_model_states.pt. [default3]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_64_model_states.pt... [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_64_model_states.pt. [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_34_model_states.pt... [default3]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_34_model_states.pt. [default3]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_09_model_states.pt... [default6]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_09_model_states.pt. [default6]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_58_model_states.pt... [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_58_model_states.pt. [default3]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_69_model_states.pt... [default6]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_69_model_states.pt. [default6]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_65_model_states.pt... [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_65_model_states.pt. [default5]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_36_model_states.pt... [default3]:[2022-09-03 19:58:29,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_36_model_states.pt. [default3]:[2022-09-03 19:58:29,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_69_model_states.pt... [default5]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_69_model_states.pt. [default5]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_15_model_states.pt... [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_15_model_states.pt. [default6]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_59_model_states.pt... [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_59_model_states.pt. [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_05_model_states.pt... [default6]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_05_model_states.pt. [default6]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_05_model_states.pt... [default5]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_05_model_states.pt. [default5]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_61_model_states.pt... [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_61_model_states.pt. [default5]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_60_model_states.pt... [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_60_model_states.pt. [default1]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_60_model_states.pt... [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_60_model_states.pt. [default3]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_61_model_states.pt... [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_61_model_states.pt. [default7]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_60_model_states.pt... [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_60_model_states.pt. [default2]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_50_model_states.pt... [default3]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_50_model_states.pt. [default3]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_04_model_states.pt... [default2]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_04_model_states.pt. [default2]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_15_model_states.pt... [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_15_model_states.pt. [default2]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_14_model_states.pt... [default2]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_14_model_states.pt. [default2]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_70_model_states.pt... [default2]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_70_model_states.pt. [default2]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_70_model_states.pt... [default0]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_70_model_states.pt. [default0]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_33_model_states.pt... [default6]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_33_model_states.pt. [default6]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_32_model_states.pt... [default1]:[2022-09-03 19:58:29,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_32_model_states.pt. [default1]:[2022-09-03 19:58:29,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_59_model_states.pt... [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_59_model_states.pt. [default4]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_59_model_states.pt... [default7]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_59_model_states.pt. [default4]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_15_model_states.pt... [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_15_model_states.pt. [default4]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_59_model_states.pt... [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_59_model_states.pt. [default6]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_51_model_states.pt... [default4]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_51_model_states.pt. [default4]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_51_model_states.pt... [default5]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_51_model_states.pt. [default5]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_70_model_states.pt... [default1]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_70_model_states.pt. [default1]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_70_model_states.pt... [default3]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_70_model_states.pt. [default3]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_71_model_states.pt... [default5]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_71_model_states.pt. [default7]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_71_model_states.pt... [default7]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_71_model_states.pt. [default7]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_33_model_states.pt... [default7]:[2022-09-03 19:58:29,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_33_model_states.pt. [default7]:[2022-09-03 19:58:29,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_71_model_states.pt... [default6]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_71_model_states.pt. [default4]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_71_model_states.pt... [default4]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_71_model_states.pt. [default0]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default0]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_66_model_states.pt... [default0]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_66_model_states.pt. [default0]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default1]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_66_model_states.pt... [default1]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_66_model_states.pt. [default1]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_01_model_states.pt... [default6]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_01_model_states.pt. [default6]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_01_model_states.pt... [default4]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_01_model_states.pt. [default6]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_65_model_states.pt... [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_65_model_states.pt. [default4]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_65_model_states.pt... [default4]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_65_model_states.pt... [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_65_model_states.pt. [default7]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt... [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_65_model_states.pt. [default4]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_64_model_states.pt... [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_64_model_states.pt. [default2]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_01_model_states.pt... [default7]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_01_model_states.pt. [default7]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_05_model_states.pt... [default7]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_05_model_states.pt. [default7]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_15_model_states.pt... [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_15_model_states.pt. [default7]:[2022-09-03 19:58:29,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_01_model_states.pt... [default5]:[2022-09-03 19:58:29,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_01_model_states.pt. [default5]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_66_model_states.pt... [default6]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_67_model_states.pt... [default2]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default2]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_66_model_states.pt... [default2]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_66_model_states.pt. [default2]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_66_model_states.pt. [default3]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_67_model_states.pt. [default4]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default4]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default4]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_67_model_states.pt... [default4]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_67_model_states.pt. [default4]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default3]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default3]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_14_model_states.pt... [default3]:[2022-09-03 19:58:29,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_14_model_states.pt. [default7]:[2022-09-03 19:58:29,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default7]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_67_model_states.pt... [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_67_model_states.pt... [default7]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_67_model_states.pt. [default5]:[2022-09-03 19:58:29,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_67_model_states.pt. [default3]:[2022-09-03 19:58:29,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt... [default6]:[2022-09-03 19:58:29,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default6]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default6]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_51_model_states.pt... [default6]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_51_model_states.pt. [default6]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt... [default7]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default7]:[2022-09-03 19:58:29,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_51_model_states.pt... [default7]:[2022-09-03 19:58:29,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_51_model_states.pt. [default7]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt... [default5]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_00_model_states.pt. [default5]:[2022-09-03 19:58:29,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_29_model_states.pt... [default5]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_29_model_states.pt. [default7]:[2022-09-03 19:58:29,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_29_model_states.pt. [default7]:[2022-09-03 19:58:29,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt... [default5]:[2022-09-03 19:58:29,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_38_model_states.pt. [default2]:[2022-09-03 19:58:29,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_38_model_states.pt. [default0]:[2022-09-03 19:58:29,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_38_model_states.pt. [default1]:[2022-09-03 19:58:29,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt... [default3]:[2022-09-03 19:58:29,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_38_model_states.pt. [default3]:[2022-09-03 19:58:29,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt... [default1]:[2022-09-03 19:58:29,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_02_model_states.pt. [default1]:[2022-09-03 19:58:29,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt... [default0]:[2022-09-03 19:58:29,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/mp_rank_02_model_states.pt. [default0]:[2022-09-03 19:58:29,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt... [default2]:[2022-09-03 19:58:29,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt... [default2]:[2022-09-03 19:58:31,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt. [default1]:[2022-09-03 19:58:31,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt. [default1]:[2022-09-03 19:58:31,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt... [default2]:[2022-09-03 19:58:31,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt... [default5]:[2022-09-03 19:58:31,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt. [default5]:[2022-09-03 19:58:31,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt... [default4]:[2022-09-03 19:58:31,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt. [default4]:[2022-09-03 19:58:31,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt... [default5]:[2022-09-03 19:58:31,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt... [default6]:[2022-09-03 19:58:31,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt. [default6]:[2022-09-03 19:58:32,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt... [default5]:[2022-09-03 19:58:32,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt... [default7]:[2022-09-03 19:58:32,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt... [default7]:[2022-09-03 19:58:32,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt... [default4]:[2022-09-03 19:58:32,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt. [default6]:[2022-09-03 19:58:32,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt... [default7]:[2022-09-03 19:58:32,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt... [default4]:[2022-09-03 19:58:32,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt. [default4]:[2022-09-03 19:58:32,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt. [default6]:[2022-09-03 19:58:32,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt... [default7]:[2022-09-03 19:58:32,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt... [default4]:[2022-09-03 19:58:32,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt... [default4]:[2022-09-03 19:58:32,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt. [default4]:[2022-09-03 19:58:32,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt... [default7]:[2022-09-03 19:58:32,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt... [default5]:[2022-09-03 19:58:32,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt... [default5]:[2022-09-03 19:58:32,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt... [default7]:[2022-09-03 19:58:32,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt. [default4]:[2022-09-03 19:58:32,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt... [default4]:[2022-09-03 19:58:32,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt. [default6]:[2022-09-03 19:58:32,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt. [default6]:[2022-09-03 19:58:32,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt... [default7]:[2022-09-03 19:58:32,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt... [default4]:[2022-09-03 19:58:32,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt. [default4]:[2022-09-03 19:58:32,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt. [default6]:[2022-09-03 19:58:32,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt. [default4]:[2022-09-03 19:58:32,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt. [default4]:[2022-09-03 19:58:32,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt. [default6]:[2022-09-03 19:58:32,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt... [default4]:[2022-09-03 19:58:32,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt... [default4]:[2022-09-03 19:58:32,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt... [default7]:[2022-09-03 19:58:32,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt... [default5]:[2022-09-03 19:58:32,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt. [default6]:[2022-09-03 19:58:32,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt. [default6]:[2022-09-03 19:58:32,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt... [default4]:[2022-09-03 19:58:32,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt. [default4]:[2022-09-03 19:58:32,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt... [default5]:[2022-09-03 19:58:32,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt... [default7]:[2022-09-03 19:58:32,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt... [default4]:[2022-09-03 19:58:32,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt. [default4]:[2022-09-03 19:58:32,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt. [default4]:[2022-09-03 19:58:32,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt. [default6]:[2022-09-03 19:58:32,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt. [default6]:[2022-09-03 19:58:32,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt... [default4]:[2022-09-03 19:58:32,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt. [default4]:[2022-09-03 19:58:32,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt... [default5]:[2022-09-03 19:58:32,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt. [default6]:[2022-09-03 19:58:32,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt... [default4]:[2022-09-03 19:58:32,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt. [default4]:[2022-09-03 19:58:32,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt... [default7]:[2022-09-03 19:58:32,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt... [default7]:[2022-09-03 19:58:32,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt... [default5]:[2022-09-03 19:58:32,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt. [default5]:[2022-09-03 19:58:32,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt. [default6]:[2022-09-03 19:58:32,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt... [default4]:[2022-09-03 19:58:32,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt... [default7]:[2022-09-03 19:58:32,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt... [default5]:[2022-09-03 19:58:32,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt... [default1]:[2022-09-03 19:58:32,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt. [default1]:[2022-09-03 19:58:32,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt... [default0]:[2022-09-03 19:58:32,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt. [default0]:[2022-09-03 19:58:32,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt... [default2]:[2022-09-03 19:58:32,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt. [default2]:[2022-09-03 19:58:32,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt... [default3]:[2022-09-03 19:58:33,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt. [default3]:[2022-09-03 19:58:32,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt... [default6]:[2022-09-03 19:58:32,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt. [default6]:[2022-09-03 19:58:32,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt... [default1]:[2022-09-03 19:58:33,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt. [default7]:[2022-09-03 19:58:32,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt... [default4]:[2022-09-03 19:58:32,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt. [default4]:[2022-09-03 19:58:32,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt... [default7]:[2022-09-03 19:58:33,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt... [default3]:[2022-09-03 19:58:32,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_72-model_00-model_states.pt. [default0]:[2022-09-03 19:58:33,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt... [default0]:[2022-09-03 19:58:33,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt... [default4]:[2022-09-03 19:58:33,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt. [default5]:[2022-09-03 19:58:33,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt... [default7]:[2022-09-03 19:58:33,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt... [default1]:[2022-09-03 19:58:33,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt... [default4]:[2022-09-03 19:58:33,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt. [default6]:[2022-09-03 19:58:33,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt... [default7]:[2022-09-03 19:58:33,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt... [default4]:[2022-09-03 19:58:33,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt. [default0]:[2022-09-03 19:58:33,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt. [default0]:[2022-09-03 19:58:33,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt... [default1]:[2022-09-03 19:58:33,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt. [default1]:[2022-09-03 19:58:33,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt... [default3]:[2022-09-03 19:58:33,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt. [default6]:[2022-09-03 19:58:33,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt... [default7]:[2022-09-03 19:58:33,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt. [default5]:[2022-09-03 19:58:33,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt... [default4]:[2022-09-03 19:58:33,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt... [default3]:[2022-09-03 19:58:33,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt... [default0]:[2022-09-03 19:58:33,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt. [default0]:[2022-09-03 19:58:33,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt... [default4]:[2022-09-03 19:58:33,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt... [default1]:[2022-09-03 19:58:33,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt. [default1]:[2022-09-03 19:58:33,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt... [default1]:[2022-09-03 19:58:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt. [default1]:[2022-09-03 19:58:33,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt... [default0]:[2022-09-03 19:58:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt. [default0]:[2022-09-03 19:58:33,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt. [default5]:[2022-09-03 19:58:33,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt. [default6]:[2022-09-03 19:58:33,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt... [default7]:[2022-09-03 19:58:33,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt. [default5]:[2022-09-03 19:58:33,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt... [default0]:[2022-09-03 19:58:33,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt. [default5]:[2022-09-03 19:58:33,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt... [default4]:[2022-09-03 19:58:33,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt. [default6]:[2022-09-03 19:58:33,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt. [default6]:[2022-09-03 19:58:33,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt... [default3]:[2022-09-03 19:58:33,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt... [default1]:[2022-09-03 19:58:33,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt. [default1]:[2022-09-03 19:58:33,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt... [default7]:[2022-09-03 19:58:33,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt... [default4]:[2022-09-03 19:58:33,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt... [default0]:[2022-09-03 19:58:33,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt... [default4]:[2022-09-03 19:58:33,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt... [default0]:[2022-09-03 19:58:33,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt. [default6]:[2022-09-03 19:58:33,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt. [default5]:[2022-09-03 19:58:33,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt... [default1]:[2022-09-03 19:58:33,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt. [default1]:[2022-09-03 19:58:33,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt... [default3]:[2022-09-03 19:58:33,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt. [default6]:[2022-09-03 19:58:33,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt... [default7]:[2022-09-03 19:58:33,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt... [default4]:[2022-09-03 19:58:33,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt. [default5]:[2022-09-03 19:58:33,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt... [default0]:[2022-09-03 19:58:33,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt... [default4]:[2022-09-03 19:58:33,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt... [default0]:[2022-09-03 19:58:33,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt. [default0]:[2022-09-03 19:58:33,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt... [default1]:[2022-09-03 19:58:33,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt. [default1]:[2022-09-03 19:58:33,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt. [default6]:[2022-09-03 19:58:33,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt. [default5]:[2022-09-03 19:58:33,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt... [default3]:[2022-09-03 19:58:33,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt... [default3]:[2022-09-03 19:58:33,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt... [default3]:[2022-09-03 19:58:33,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt... [default0]:[2022-09-03 19:58:33,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt. [default0]:[2022-09-03 19:58:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt... [default1]:[2022-09-03 19:58:33,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt. [default1]:[2022-09-03 19:58:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt... [default0]:[2022-09-03 19:58:33,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt. [default0]:[2022-09-03 19:58:33,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt... [default3]:[2022-09-03 19:58:33,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt... [default1]:[2022-09-03 19:58:33,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt. [default1]:[2022-09-03 19:58:33,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt... [default7]:[2022-09-03 19:58:33,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt... [default0]:[2022-09-03 19:58:33,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt. [default1]:[2022-09-03 19:58:33,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt... [default4]:[2022-09-03 19:58:33,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt. [default6]:[2022-09-03 19:58:33,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt... [default7]:[2022-09-03 19:58:33,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt. [default5]:[2022-09-03 19:58:33,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt... [default3]:[2022-09-03 19:58:33,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt... [default7]:[2022-09-03 19:58:33,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt... [default0]:[2022-09-03 19:58:33,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt... [default1]:[2022-09-03 19:58:33,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt... [default1]:[2022-09-03 19:58:33,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt. [default1]:[2022-09-03 19:58:33,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt. [default5]:[2022-09-03 19:58:33,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt. [default2]:[2022-09-03 19:58:33,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt... [default0]:[2022-09-03 19:58:33,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt... [default0]:[2022-09-03 19:58:33,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt. [default0]:[2022-09-03 19:58:33,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt... [default0]:[2022-09-03 19:58:33,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt. [default0]:[2022-09-03 19:58:33,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt... [default6]:[2022-09-03 19:58:33,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt. [default1]:[2022-09-03 19:58:33,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt... [default1]:[2022-09-03 19:58:33,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt. [default1]:[2022-09-03 19:58:33,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt... [default1]:[2022-09-03 19:58:33,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt. [default1]:[2022-09-03 19:58:33,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt... [default7]:[2022-09-03 19:58:33,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt. [default6]:[2022-09-03 19:58:33,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt... [default4]:[2022-09-03 19:58:33,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt. [default6]:[2022-09-03 19:58:33,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt. [default5]:[2022-09-03 19:58:33,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt... [default7]:[2022-09-03 19:58:33,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt. [default5]:[2022-09-03 19:58:33,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt. [default6]:[2022-09-03 19:58:33,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt... [default2]:[2022-09-03 19:58:33,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt. [default6]:[2022-09-03 19:58:33,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt... [default6]:[2022-09-03 19:58:33,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt... [default3]:[2022-09-03 19:58:33,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt... [default3]:[2022-09-03 19:58:33,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt... [default3]:[2022-09-03 19:58:33,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_74-model_00-model_states.pt. [default3]:[2022-09-03 19:58:33,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt... [default4]:[2022-09-03 19:58:34,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt. [default0]:[2022-09-03 19:58:33,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_50-model_00-model_states.pt. [default6]:[2022-09-03 19:58:33,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt. [default4]:[2022-09-03 19:58:33,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt. [default6]:[2022-09-03 19:58:34,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt. [default5]:[2022-09-03 19:58:33,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt. [default7]:[2022-09-03 19:58:33,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_51-model_00-model_states.pt. [default7]:[2022-09-03 19:58:34,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt. [default7]:[2022-09-03 19:58:34,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt... [default4]:[2022-09-03 19:58:34,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt. [default4]:[2022-09-03 19:58:34,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt... [default5]:[2022-09-03 19:58:34,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt. [default5]:[2022-09-03 19:58:34,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt... [default5]:[2022-09-03 19:58:33,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt. [default1]:[2022-09-03 19:58:34,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt. [default1]:[2022-09-03 19:58:34,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt... [default0]:[2022-09-03 19:58:34,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt. [default0]:[2022-09-03 19:58:34,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt... [default5]:[2022-09-03 19:58:34,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt. [default5]:[2022-09-03 19:58:34,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt... [default4]:[2022-09-03 19:58:34,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt. [default4]:[2022-09-03 19:58:34,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt... [default5]:[2022-09-03 19:58:34,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt. [default3]:[2022-09-03 19:58:34,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt. [default2]:[2022-09-03 19:58:34,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt. [default1]:[2022-09-03 19:58:34,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt. [default0]:[2022-09-03 19:58:34,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt. [default3]:[2022-09-03 19:58:34,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt. [default3]:[2022-09-03 19:58:34,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt... [default7]:[2022-09-03 19:58:34,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_47-model_00-model_states.pt. [default6]:[2022-09-03 19:58:34,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt. [default7]:[2022-09-03 19:58:34,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt. [default7]:[2022-09-03 19:58:34,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt. [default3]:[2022-09-03 19:58:34,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt. [default3]:[2022-09-03 19:58:34,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt... [default2]:[2022-09-03 19:58:34,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt. [default2]:[2022-09-03 19:58:34,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt... [default1]:[2022-09-03 19:58:34,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt. [default1]:[2022-09-03 19:58:34,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt... [default2]:[2022-09-03 19:58:34,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt. [default2]:[2022-09-03 19:58:34,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt... [default0]:[2022-09-03 19:58:34,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt. [default0]:[2022-09-03 19:58:34,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt... [default2]:[2022-09-03 19:58:34,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt. [default2]:[2022-09-03 19:58:34,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt... [default1]:[2022-09-03 19:58:34,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt. [default1]:[2022-09-03 19:58:34,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt... [default4]:[2022-09-03 19:58:34,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt. [default5]:[2022-09-03 19:58:34,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt. [default5]:[2022-09-03 19:58:34,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt... [default3]:[2022-09-03 19:58:34,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt. [default3]:[2022-09-03 19:58:34,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt... [default0]:[2022-09-03 19:58:34,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt. [default0]:[2022-09-03 19:58:34,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt... [default1]:[2022-09-03 19:58:34,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt. [default1]:[2022-09-03 19:58:34,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt... [default1]:[2022-09-03 19:58:34,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt. [default0]:[2022-09-03 19:58:34,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt. [default0]:[2022-09-03 19:58:34,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt... [default1]:[2022-09-03 19:58:34,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt... [default4]:[2022-09-03 19:58:34,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt. [default5]:[2022-09-03 19:58:34,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt. [default6]:[2022-09-03 19:58:34,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt. [default0]:[2022-09-03 19:58:34,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt... [default5]:[2022-09-03 19:58:34,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt. [default5]:[2022-09-03 19:58:34,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt... [default0]:[2022-09-03 19:58:34,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt. [default2]:[2022-09-03 19:58:34,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt. [default4]:[2022-09-03 19:58:34,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt. [default4]:[2022-09-03 19:58:34,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt... [default3]:[2022-09-03 19:58:34,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt. [default5]:[2022-09-03 19:58:34,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt... [default4]:[2022-09-03 19:58:34,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt... [default5]:[2022-09-03 19:58:34,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt... [default6]:[2022-09-03 19:58:34,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt. [default6]:[2022-09-03 19:58:34,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt... [default7]:[2022-09-03 19:58:34,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt. [default7]:[2022-09-03 19:58:34,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt... [default6]:[2022-09-03 19:58:34,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt. [default7]:[2022-09-03 19:58:34,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt. [default7]:[2022-09-03 19:58:34,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt. [default4]:[2022-09-03 19:58:34,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt. [default5]:[2022-09-03 19:58:34,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt. [default5]:[2022-09-03 19:58:34,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt... [default4]:[2022-09-03 19:58:34,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt. [default4]:[2022-09-03 19:58:34,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt... [default6]:[2022-09-03 19:58:34,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt. [default6]:[2022-09-03 19:58:34,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt... [default0]:[2022-09-03 19:58:34,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt. [default1]:[2022-09-03 19:58:34,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt. [default7]:[2022-09-03 19:58:34,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt. [default7]:[2022-09-03 19:58:34,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt... [default2]:[2022-09-03 19:58:34,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt... [default1]:[2022-09-03 19:58:34,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt... [default3]:[2022-09-03 19:58:34,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt. [default2]:[2022-09-03 19:58:34,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_70-model_00-model_states.pt. [default5]:[2022-09-03 19:58:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt... [default0]:[2022-09-03 19:58:34,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt. [default1]:[2022-09-03 19:58:34,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt. [default6]:[2022-09-03 19:58:34,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt... [default7]:[2022-09-03 19:58:34,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt... [default4]:[2022-09-03 19:58:34,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt... [default3]:[2022-09-03 19:58:34,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt. [default2]:[2022-09-03 19:58:34,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_68-model_00-model_states.pt. [default1]:[2022-09-03 19:58:34,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt... [default7]:[2022-09-03 19:58:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt. [default4]:[2022-09-03 19:58:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt. [default6]:[2022-09-03 19:58:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt. [default5]:[2022-09-03 19:58:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_21-model_00-model_states.pt. [default6]:[2022-09-03 19:58:34,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt. [default4]:[2022-09-03 19:58:34,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt. [default2]:[2022-09-03 19:58:34,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt. [default3]:[2022-09-03 19:58:34,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt. [default6]:[2022-09-03 19:58:34,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt... [default0]:[2022-09-03 19:58:34,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_40-model_00-model_states.pt. [default3]:[2022-09-03 19:58:34,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt... [default1]:[2022-09-03 19:58:34,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt. [default1]:[2022-09-03 19:58:34,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt. [default7]:[2022-09-03 19:58:34,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt... [default2]:[2022-09-03 19:58:34,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt... [default5]:[2022-09-03 19:58:34,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt. [default4]:[2022-09-03 19:58:34,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt. [default6]:[2022-09-03 19:58:34,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt. [default2]:[2022-09-03 19:58:34,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt. [default0]:[2022-09-03 19:58:34,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt. [default0]:[2022-09-03 19:58:34,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt. [default4]:[2022-09-03 19:58:34,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt. [default5]:[2022-09-03 19:58:34,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt. [default0]:[2022-09-03 19:58:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt. [default2]:[2022-09-03 19:58:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt. [default4]:[2022-09-03 19:58:34,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt. [default4]:[2022-09-03 19:58:34,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt... [default5]:[2022-09-03 19:58:34,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt. [default5]:[2022-09-03 19:58:34,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt... [default7]:[2022-09-03 19:58:34,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt. [default0]:[2022-09-03 19:58:34,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt. [default1]:[2022-09-03 19:58:34,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt. [default2]:[2022-09-03 19:58:34,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt... [default2]:[2022-09-03 19:58:34,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt. [default7]:[2022-09-03 19:58:34,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt. [default7]:[2022-09-03 19:58:34,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_05-model_00-model_states.pt. [default2]:[2022-09-03 19:58:34,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt. [default6]:[2022-09-03 19:58:34,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_71-model_00-model_states.pt. [default7]:[2022-09-03 19:58:34,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt... [default1]:[2022-09-03 19:58:34,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt... [default1]:[2022-09-03 19:58:35,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt. [default3]:[2022-09-03 19:58:35,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt. [default3]:[2022-09-03 19:58:34,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_12-model_00-model_states.pt. [default4]:[2022-09-03 19:58:34,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt... [default2]:[2022-09-03 19:58:35,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt... [default1]:[2022-09-03 19:58:35,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt. [default1]:[2022-09-03 19:58:35,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt... [default2]:[2022-09-03 19:58:35,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt. [default2]:[2022-09-03 19:58:35,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt... [default1]:[2022-09-03 19:58:35,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt. [default5]:[2022-09-03 19:58:35,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt. [default4]:[2022-09-03 19:58:35,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt. [default4]:[2022-09-03 19:58:35,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt... [default3]:[2022-09-03 19:58:35,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt. [default3]:[2022-09-03 19:58:35,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt... [default0]:[2022-09-03 19:58:35,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt... [default6]:[2022-09-03 19:58:35,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt. [default7]:[2022-09-03 19:58:35,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt. [default3]:[2022-09-03 19:58:35,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt... [default2]:[2022-09-03 19:58:35,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_18-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt. [default6]:[2022-09-03 19:58:35,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt... [default1]:[2022-09-03 19:58:35,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt. [default1]:[2022-09-03 19:58:35,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt. [default2]:[2022-09-03 19:58:35,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt. [default3]:[2022-09-03 19:58:35,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_44-model_00-model_states.pt. [default3]:[2022-09-03 19:58:35,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_28-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt... [default3]:[2022-09-03 19:58:35,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt. [default5]:[2022-09-03 19:58:35,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt... [default6]:[2022-09-03 19:58:35,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt... [default6]:[2022-09-03 19:58:35,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt... [default7]:[2022-09-03 19:58:35,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt... [default0]:[2022-09-03 19:58:35,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt... [default2]:[2022-09-03 19:58:35,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt... [default7]:[2022-09-03 19:58:35,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt... [default1]:[2022-09-03 19:58:35,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_46-model_00-model_states.pt. [default7]:[2022-09-03 19:58:35,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt. [default4]:[2022-09-03 19:58:35,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt. [default6]:[2022-09-03 19:58:35,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_57-model_00-model_states.pt. [default1]:[2022-09-03 19:58:35,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt... [default3]:[2022-09-03 19:58:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt. [default1]:[2022-09-03 19:58:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt. [default7]:[2022-09-03 19:58:35,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt. [default5]:[2022-09-03 19:58:35,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt. [default3]:[2022-09-03 19:58:35,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt... [default4]:[2022-09-03 19:58:35,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt. [default6]:[2022-09-03 19:58:35,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_53-model_00-model_states.pt. [default3]:[2022-09-03 19:58:35,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt... [default3]:[2022-09-03 19:58:35,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt... [default3]:[2022-09-03 19:58:35,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_20-model_00-model_states.pt. [default6]:[2022-09-03 19:58:35,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt... [default2]:[2022-09-03 19:58:35,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt... [default7]:[2022-09-03 19:58:35,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt... [default6]:[2022-09-03 19:58:35,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt. [default4]:[2022-09-03 19:58:35,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt. [default5]:[2022-09-03 19:58:35,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt. [default1]:[2022-09-03 19:58:35,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt. [default2]:[2022-09-03 19:58:35,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt. [default2]:[2022-09-03 19:58:35,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt. [default3]:[2022-09-03 19:58:35,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt. [default3]:[2022-09-03 19:58:35,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_54-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt... [default1]:[2022-09-03 19:58:35,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_30-model_00-model_states.pt. [default1]:[2022-09-03 19:58:35,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt... [default5]:[2022-09-03 19:58:35,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt. [default7]:[2022-09-03 19:58:35,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt. [default4]:[2022-09-03 19:58:35,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt. [default6]:[2022-09-03 19:58:35,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_61-model_00-model_states.pt. [default2]:[2022-09-03 19:58:35,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt... [default2]:[2022-09-03 19:58:35,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt... [default7]:[2022-09-03 19:58:35,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt. [default6]:[2022-09-03 19:58:35,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt. [default4]:[2022-09-03 19:58:35,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt. [default5]:[2022-09-03 19:58:35,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_15-model_00-model_states.pt. [default2]:[2022-09-03 19:58:35,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt. [default5]:[2022-09-03 19:58:35,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_67-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt... [default3]:[2022-09-03 19:58:35,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt. [default3]:[2022-09-03 19:58:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt... [default1]:[2022-09-03 19:58:35,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt. [default1]:[2022-09-03 19:58:35,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt... [default2]:[2022-09-03 19:58:35,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt. [default2]:[2022-09-03 19:58:35,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt... [default7]:[2022-09-03 19:58:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt... [default1]:[2022-09-03 19:58:35,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt... [default0]:[2022-09-03 19:58:35,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt... [default3]:[2022-09-03 19:58:35,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_66-model_00-model_states.pt. [default4]:[2022-09-03 19:58:35,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt... [default0]:[2022-09-03 19:58:35,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt... [default1]:[2022-09-03 19:58:35,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt... [default6]:[2022-09-03 19:58:35,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt... [default2]:[2022-09-03 19:58:35,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt... [default1]:[2022-09-03 19:58:35,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt... [default0]:[2022-09-03 19:58:35,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt... [default3]:[2022-09-03 19:58:35,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt... [default6]:[2022-09-03 19:58:35,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt. [default5]:[2022-09-03 19:58:35,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt. [default7]:[2022-09-03 19:58:35,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt. [default4]:[2022-09-03 19:58:35,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_31-model_00-model_states.pt. [default2]:[2022-09-03 19:58:35,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt... [default3]:[2022-09-03 19:58:35,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_52-model_00-model_states.pt. [default2]:[2022-09-03 19:58:35,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt. [default3]:[2022-09-03 19:58:35,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt... [default7]:[2022-09-03 19:58:35,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_69-model_00-model_states.pt. [default2]:[2022-09-03 19:58:35,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt... [default6]:[2022-09-03 19:58:35,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt. [default1]:[2022-09-03 19:58:35,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt... [default5]:[2022-09-03 19:58:35,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt... [default6]:[2022-09-03 19:58:35,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt... [default7]:[2022-09-03 19:58:35,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt... [default2]:[2022-09-03 19:58:35,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt... [default3]:[2022-09-03 19:58:35,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt... [default4]:[2022-09-03 19:58:35,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt. [default2]:[2022-09-03 19:58:35,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt... [default4]:[2022-09-03 19:58:35,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt. [default5]:[2022-09-03 19:58:35,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt. [default6]:[2022-09-03 19:58:35,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_11-model_00-model_states.pt. [default6]:[2022-09-03 19:58:35,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt... [default1]:[2022-09-03 19:58:35,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt... [default0]:[2022-09-03 19:58:35,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt. [default2]:[2022-09-03 19:58:35,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt... [default0]:[2022-09-03 19:58:35,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt... [default6]:[2022-09-03 19:58:35,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt... [default7]:[2022-09-03 19:58:35,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt... [default7]:[2022-09-03 19:58:35,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt... [default4]:[2022-09-03 19:58:35,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt... [default4]:[2022-09-03 19:58:35,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt... [default0]:[2022-09-03 19:58:35,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt... [default1]:[2022-09-03 19:58:35,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt... [default6]:[2022-09-03 19:58:35,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt... [default7]:[2022-09-03 19:58:35,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt. [default1]:[2022-09-03 19:58:35,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt. [default1]:[2022-09-03 19:58:35,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt... [default3]:[2022-09-03 19:58:36,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt... [default6]:[2022-09-03 19:58:35,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt. [default4]:[2022-09-03 19:58:35,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt... [default2]:[2022-09-03 19:58:35,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt... [default5]:[2022-09-03 19:58:35,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt... [default5]:[2022-09-03 19:58:36,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt. [default6]:[2022-09-03 19:58:35,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt. [default6]:[2022-09-03 19:58:36,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt. [default4]:[2022-09-03 19:58:35,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt. [default0]:[2022-09-03 19:58:35,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt... [default3]:[2022-09-03 19:58:36,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt... [default4]:[2022-09-03 19:58:36,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt... [default1]:[2022-09-03 19:58:35,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt... [default5]:[2022-09-03 19:58:35,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt... [default4]:[2022-09-03 19:58:35,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt... [default5]:[2022-09-03 19:58:35,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_63-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt... [default3]:[2022-09-03 19:58:36,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt. [default7]:[2022-09-03 19:58:36,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt. [default0]:[2022-09-03 19:58:36,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt. [default1]:[2022-09-03 19:58:36,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt. [default3]:[2022-09-03 19:58:36,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_22-model_00-model_states.pt. [default0]:[2022-09-03 19:58:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt... [default3]:[2022-09-03 19:58:36,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_14-model_00-model_states.pt. [default4]:[2022-09-03 19:58:36,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt. [default0]:[2022-09-03 19:58:36,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt. [default6]:[2022-09-03 19:58:36,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt. [default5]:[2022-09-03 19:58:36,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt. [default7]:[2022-09-03 19:58:36,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt. [default0]:[2022-09-03 19:58:36,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt. [default0]:[2022-09-03 19:58:36,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt... [default1]:[2022-09-03 19:58:36,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt. [default1]:[2022-09-03 19:58:36,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt... [default1]:[2022-09-03 19:58:36,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt... [default4]:[2022-09-03 19:58:36,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_55-model_00-model_states.pt. [default5]:[2022-09-03 19:58:36,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt... [default7]:[2022-09-03 19:58:36,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt. [default5]:[2022-09-03 19:58:36,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_45-model_00-model_states.pt. [default1]:[2022-09-03 19:58:36,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt... [default6]:[2022-09-03 19:58:36,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt... [default4]:[2022-09-03 19:58:36,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt. [default3]:[2022-09-03 19:58:36,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt... [default7]:[2022-09-03 19:58:36,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt. [default1]:[2022-09-03 19:58:36,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt. [default0]:[2022-09-03 19:58:36,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt. [default5]:[2022-09-03 19:58:36,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt. [default6]:[2022-09-03 19:58:36,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_49-model_00-model_states.pt. [default3]:[2022-09-03 19:58:36,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_60-model_00-model_states.pt. [default3]:[2022-09-03 19:58:36,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_36-model_00-model_states.pt. [default6]:[2022-09-03 19:58:36,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt. [default5]:[2022-09-03 19:58:36,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt. [default4]:[2022-09-03 19:58:36,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_17-model_00-model_states.pt. [default3]:[2022-09-03 19:58:36,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt... [default2]:[2022-09-03 19:58:36,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt... [default3]:[2022-09-03 19:58:36,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt... [default4]:[2022-09-03 19:58:36,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt... [default0]:[2022-09-03 19:58:36,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt... [default6]:[2022-09-03 19:58:36,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt... [default3]:[2022-09-03 19:58:36,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt... [default3]:[2022-09-03 19:58:36,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt... [default7]:[2022-09-03 19:58:36,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt... [default0]:[2022-09-03 19:58:36,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt... [default2]:[2022-09-03 19:58:36,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt... [default4]:[2022-09-03 19:58:36,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt... [default3]:[2022-09-03 19:58:36,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt... [default3]:[2022-09-03 19:58:36,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt. [default7]:[2022-09-03 19:58:36,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt... [default5]:[2022-09-03 19:58:36,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt... [default0]:[2022-09-03 19:58:36,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt... [default1]:[2022-09-03 19:58:36,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt. [default5]:[2022-09-03 19:58:36,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt... [default1]:[2022-09-03 19:58:36,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt... [default6]:[2022-09-03 19:58:36,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt... [default3]:[2022-09-03 19:58:36,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt. [default0]:[2022-09-03 19:58:36,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt... [default6]:[2022-09-03 19:58:36,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt... [default6]:[2022-09-03 19:58:36,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt... [default3]:[2022-09-03 19:58:36,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt... [default0]:[2022-09-03 19:58:36,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt. [default1]:[2022-09-03 19:58:36,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt... [default4]:[2022-09-03 19:58:36,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt. [default5]:[2022-09-03 19:58:36,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt. [default6]:[2022-09-03 19:58:36,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt. [default5]:[2022-09-03 19:58:36,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt... [default3]:[2022-09-03 19:58:36,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt... [default0]:[2022-09-03 19:58:36,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt... [default0]:[2022-09-03 19:58:36,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt... [default7]:[2022-09-03 19:58:36,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt... [default6]:[2022-09-03 19:58:36,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt... [default5]:[2022-09-03 19:58:36,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt... [default4]:[2022-09-03 19:58:36,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt... [default6]:[2022-09-03 19:58:36,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt... [default6]:[2022-09-03 19:58:36,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt... [default5]:[2022-09-03 19:58:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt... [default4]:[2022-09-03 19:58:36,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt... [default7]:[2022-09-03 19:58:36,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt... [default7]:[2022-09-03 19:58:36,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt... [default1]:[2022-09-03 19:58:36,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt... [default5]:[2022-09-03 19:58:36,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt... [default6]:[2022-09-03 19:58:36,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt... [default4]:[2022-09-03 19:58:36,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt... [default0]:[2022-09-03 19:58:36,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt. [default3]:[2022-09-03 19:58:36,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt... [default4]:[2022-09-03 19:58:36,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt... [default7]:[2022-09-03 19:58:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt... [default7]:[2022-09-03 19:58:36,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt. [default6]:[2022-09-03 19:58:36,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt. [default0]:[2022-09-03 19:58:36,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt. [default1]:[2022-09-03 19:58:36,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt. [default5]:[2022-09-03 19:58:36,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt... [default1]:[2022-09-03 19:58:36,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt. [default5]:[2022-09-03 19:58:36,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt... [default4]:[2022-09-03 19:58:36,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt. [default7]:[2022-09-03 19:58:36,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt. [default6]:[2022-09-03 19:58:36,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt. [default3]:[2022-09-03 19:58:36,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_38-model_00-model_states.pt. [default5]:[2022-09-03 19:58:36,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_37-model_00-model_states.pt. [default3]:[2022-09-03 19:58:36,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt. [default1]:[2022-09-03 19:58:36,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt. [default6]:[2022-09-03 19:58:36,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt. [default4]:[2022-09-03 19:58:36,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt. [default5]:[2022-09-03 19:58:36,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt. [default6]:[2022-09-03 19:58:36,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt. [default7]:[2022-09-03 19:58:36,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_25-model_00-model_states.pt. [default3]:[2022-09-03 19:58:36,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt... [default7]:[2022-09-03 19:58:36,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt... [default6]:[2022-09-03 19:58:36,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt... [default4]:[2022-09-03 19:58:36,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_23-model_00-model_states.pt. [default0]:[2022-09-03 19:58:36,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt. [default0]:[2022-09-03 19:58:36,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt... [default2]:[2022-09-03 19:58:36,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt... [default3]:[2022-09-03 19:58:36,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt. [default0]:[2022-09-03 19:58:36,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt. [default1]:[2022-09-03 19:58:36,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_42-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_56-model_00-model_states.pt. [default0]:[2022-09-03 19:58:36,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt. [default2]:[2022-09-03 19:58:36,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_62-model_00-model_states.pt. [default1]:[2022-09-03 19:58:36,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_34-model_00-model_states.pt. [default7]:[2022-09-03 19:58:36,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt... [default1]:[2022-09-03 19:58:36,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt... [default2]:[2022-09-03 19:58:36,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt... [default3]:[2022-09-03 19:58:36,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt... [default6]:[2022-09-03 19:58:36,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt... [default0]:[2022-09-03 19:58:36,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt... [default4]:[2022-09-03 19:58:36,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt... [default5]:[2022-09-03 19:58:36,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt... [default3]:[2022-09-03 19:58:36,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_16-model_00-model_states.pt. [default0]:[2022-09-03 19:58:36,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt... [default7]:[2022-09-03 19:58:37,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt... [default6]:[2022-09-03 19:58:37,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt... [default3]:[2022-09-03 19:58:37,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt... [default3]:[2022-09-03 19:58:37,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt... [default2]:[2022-09-03 19:58:37,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt... [default4]:[2022-09-03 19:58:37,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt... [default5]:[2022-09-03 19:58:37,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt... [default1]:[2022-09-03 19:58:37,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt... [default4]:[2022-09-03 19:58:37,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt... [default4]:[2022-09-03 19:58:37,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt... [default5]:[2022-09-03 19:58:37,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt... [default5]:[2022-09-03 19:58:37,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt... [default1]:[2022-09-03 19:58:37,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt. [default1]:[2022-09-03 19:58:37,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt... [default0]:[2022-09-03 19:58:37,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt... [default1]:[2022-09-03 19:58:37,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt... [default7]:[2022-09-03 19:58:37,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt... [default7]:[2022-09-03 19:58:37,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt... [default6]:[2022-09-03 19:58:37,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt... [default6]:[2022-09-03 19:58:37,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt... [default2]:[2022-09-03 19:58:37,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt. [default2]:[2022-09-03 19:58:37,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt... [default5]:[2022-09-03 19:58:37,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt... [default4]:[2022-09-03 19:58:37,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt... [default4]:[2022-09-03 19:58:37,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt. [default5]:[2022-09-03 19:58:37,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt. [default7]:[2022-09-03 19:58:37,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_29-model_00-model_states.pt. [default7]:[2022-09-03 19:58:37,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt... [default5]:[2022-09-03 19:58:37,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt... [default4]:[2022-09-03 19:58:37,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt... [default3]:[2022-09-03 19:58:37,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt. [default2]:[2022-09-03 19:58:37,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt... [default1]:[2022-09-03 19:58:37,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_58-model_00-model_states.pt. [default2]:[2022-09-03 19:58:37,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt... [default3]:[2022-09-03 19:58:37,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt... [default1]:[2022-09-03 19:58:37,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt... [default4]:[2022-09-03 19:58:37,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt. [default6]:[2022-09-03 19:58:37,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt... [default1]:[2022-09-03 19:58:37,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt... [default3]:[2022-09-03 19:58:37,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt... [default6]:[2022-09-03 19:58:37,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt. [default7]:[2022-09-03 19:58:37,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt. [default0]:[2022-09-03 19:58:37,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt... [default5]:[2022-09-03 19:58:37,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_35-model_00-model_states.pt. [default2]:[2022-09-03 19:58:37,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt... [default3]:[2022-09-03 19:58:37,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_06-model_00-model_states.pt. [default1]:[2022-09-03 19:58:37,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt. [default0]:[2022-09-03 19:58:37,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt. [default3]:[2022-09-03 19:58:37,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt. [default7]:[2022-09-03 19:58:37,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt... [default7]:[2022-09-03 19:58:37,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt. [default4]:[2022-09-03 19:58:37,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_19-model_00-model_states.pt. [default0]:[2022-09-03 19:58:37,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt... [default1]:[2022-09-03 19:58:37,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt... [default7]:[2022-09-03 19:58:37,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt. [default4]:[2022-09-03 19:58:37,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt. [default5]:[2022-09-03 19:58:37,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt. [default4]:[2022-09-03 19:58:37,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt. [default5]:[2022-09-03 19:58:37,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_43-model_00-model_states.pt. [default2]:[2022-09-03 19:58:37,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_10-model_00-model_states.pt. [default6]:[2022-09-03 19:58:37,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt... [default2]:[2022-09-03 19:58:37,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_24-model_00-model_states.pt. [default7]:[2022-09-03 19:58:37,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt... [default1]:[2022-09-03 19:58:37,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_04-model_00-model_states.pt. [default3]:[2022-09-03 19:58:37,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt... [default6]:[2022-09-03 19:58:37,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_59-model_00-model_states.pt. [default1]:[2022-09-03 19:58:37,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt... [default4]:[2022-09-03 19:58:37,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt... [default2]:[2022-09-03 19:58:37,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt... [default7]:[2022-09-03 19:58:37,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt... [default2]:[2022-09-03 19:58:37,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt... [default5]:[2022-09-03 19:58:37,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt... [default3]:[2022-09-03 19:58:37,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt... [default7]:[2022-09-03 19:58:37,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt... [default6]:[2022-09-03 19:58:37,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt... [default0]:[2022-09-03 19:58:37,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt... [default6]:[2022-09-03 19:58:37,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt... [default2]:[2022-09-03 19:58:37,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt... [default3]:[2022-09-03 19:58:37,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt... [default6]:[2022-09-03 19:58:37,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt... [default7]:[2022-09-03 19:58:37,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt... [default4]:[2022-09-03 19:58:37,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt... [default5]:[2022-09-03 19:58:38,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt... [default2]:[2022-09-03 19:58:37,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt... [default1]:[2022-09-03 19:58:38,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt... [default3]:[2022-09-03 19:58:37,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt... [default4]:[2022-09-03 19:58:38,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt... [default5]:[2022-09-03 19:58:38,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt... [default3]:[2022-09-03 19:58:38,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt... [default1]:[2022-09-03 19:58:37,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt... [default6]:[2022-09-03 19:58:37,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt... [default1]:[2022-09-03 19:58:38,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt... [default7]:[2022-09-03 19:58:37,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt... [default5]:[2022-09-03 19:58:38,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt... [default0]:[2022-09-03 19:58:38,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt... [default4]:[2022-09-03 19:58:38,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt... [default5]:[2022-09-03 19:58:38,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt... [default0]:[2022-09-03 19:58:38,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt... [default5]:[2022-09-03 19:58:38,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt... [default4]:[2022-09-03 19:58:38,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt... [default4]:[2022-09-03 19:58:38,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt... [default4]:[2022-09-03 19:58:38,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt. [default5]:[2022-09-03 19:58:38,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_65-model_00-model_states.pt. [default0]:[2022-09-03 19:58:38,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt. [default2]:[2022-09-03 19:58:38,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt. [default1]:[2022-09-03 19:58:38,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt. [default4]:[2022-09-03 19:58:38,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt... [default3]:[2022-09-03 19:58:38,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_32-model_00-model_states.pt. [default1]:[2022-09-03 19:58:38,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt... [default7]:[2022-09-03 19:58:38,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt... [default4]:[2022-09-03 19:58:38,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt... [default7]:[2022-09-03 19:58:38,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt... [default6]:[2022-09-03 19:58:38,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt... [default5]:[2022-09-03 19:58:38,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt... [default5]:[2022-09-03 19:58:38,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt. [default7]:[2022-09-03 19:58:38,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt. [default0]:[2022-09-03 19:58:38,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt. [default4]:[2022-09-03 19:58:38,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt... [default5]:[2022-09-03 19:58:38,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt... [default1]:[2022-09-03 19:58:38,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt. [default3]:[2022-09-03 19:58:38,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_48-model_00-model_states.pt. [default4]:[2022-09-03 19:58:38,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt. [default6]:[2022-09-03 19:58:38,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_33-model_00-model_states.pt. [default4]:[2022-09-03 19:58:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt. [default6]:[2022-09-03 19:58:38,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt. [default5]:[2022-09-03 19:58:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt. [default7]:[2022-09-03 19:58:38,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_09-model_00-model_states.pt. [default0]:[2022-09-03 19:58:38,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt... [default1]:[2022-09-03 19:58:38,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt... [default2]:[2022-09-03 19:58:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt... [default3]:[2022-09-03 19:58:38,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt... [default7]:[2022-09-03 19:58:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_41-model_00-model_states.pt. [default7]:[2022-09-03 19:58:38,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt. [default6]:[2022-09-03 19:58:38,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt. [default7]:[2022-09-03 19:58:38,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_13-model_00-model_states.pt. [default3]:[2022-09-03 19:58:38,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt... [default4]:[2022-09-03 19:58:38,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt. [default5]:[2022-09-03 19:58:38,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt. [default5]:[2022-09-03 19:58:38,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt... [default6]:[2022-09-03 19:58:38,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt. [default7]:[2022-09-03 19:58:38,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_27-model_00-model_states.pt. [default7]:[2022-09-03 19:58:38,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt... [default1]:[2022-09-03 19:58:38,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt... [default0]:[2022-09-03 19:58:38,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt... [default2]:[2022-09-03 19:58:38,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt... [default4]:[2022-09-03 19:58:38,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt... [default5]:[2022-09-03 19:58:38,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt... [default5]:[2022-09-03 19:58:38,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_07-model_00-model_states.pt. [default1]:[2022-09-03 19:58:38,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt. [default4]:[2022-09-03 19:58:38,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt... [default3]:[2022-09-03 19:58:38,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt. [default6]:[2022-09-03 19:58:38,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt... [default7]:[2022-09-03 19:58:38,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt... [default4]:[2022-09-03 19:58:38,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt. [default5]:[2022-09-03 19:58:38,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_39-model_00-model_states.pt. [default7]:[2022-09-03 19:58:38,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt... [default6]:[2022-09-03 19:58:38,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt... [default7]:[2022-09-03 19:58:38,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt... [default1]:[2022-09-03 19:58:38,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt. [default2]:[2022-09-03 19:58:39,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt. [default0]:[2022-09-03 19:58:38,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt. [default2]:[2022-09-03 19:58:38,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_26-model_00-model_states.pt. [default4]:[2022-09-03 19:58:39,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt... [default5]:[2022-09-03 19:58:39,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt... [default0]:[2022-09-03 19:58:39,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt. [default3]:[2022-09-03 19:58:39,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_08-model_00-model_states.pt. [default5]:[2022-09-03 19:58:39,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt... [default6]:[2022-09-03 19:58:39,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt. [default7]:[2022-09-03 19:58:39,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt. [default4]:[2022-09-03 19:58:39,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt. [default5]:[2022-09-03 19:58:39,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_03-model_00-model_states.pt. [default0]:[2022-09-03 19:58:39,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt... [default1]:[2022-09-03 19:58:39,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt... [default3]:[2022-09-03 19:58:39,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt... [default6]:[2022-09-03 19:58:39,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt... [default7]:[2022-09-03 19:58:39,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt... [default4]:[2022-09-03 19:58:39,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt... [default5]:[2022-09-03 19:58:39,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt... [default7]:[2022-09-03 19:58:39,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt... [default6]:[2022-09-03 19:58:39,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt... [default4]:[2022-09-03 19:58:39,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt... [default5]:[2022-09-03 19:58:39,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt... [default2]:[2022-09-03 19:58:39,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt... [default1]:[2022-09-03 19:58:40,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt... [default3]:[2022-09-03 19:58:40,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt... [default3]:[2022-09-03 19:58:40,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt... [default2]:[2022-09-03 19:58:40,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt... [default0]:[2022-09-03 19:58:40,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt... [default0]:[2022-09-03 19:58:40,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt. [default0]:[2022-09-03 19:58:40,211] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 280 [default0]:[2022-09-03 19:58:40,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt... [default6]:[2022-09-03 19:58:40,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt... [default7]:[2022-09-03 19:58:40,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt... [default1]:[2022-09-03 19:58:40,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt... [default5]:[2022-09-03 19:58:40,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt... [default4]:[2022-09-03 19:58:40,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt... [default0]:[2022-09-03 19:58:40,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_64-model_00-model_states.pt. [default4]:[2022-09-03 19:58:40,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt... [default5]:[2022-09-03 19:58:40,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt... [default6]:[2022-09-03 19:58:40,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt... [default7]:[2022-09-03 19:58:40,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt... [default0]:[2022-09-03 19:58:40,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt... [default1]:[2022-09-03 19:58:40,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt. [default1]:[2022-09-03 19:58:40,768] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 281 [default0]:[2022-09-03 19:58:40,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt. [default0]:[2022-09-03 19:58:40,843] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 216 [default2]:[2022-09-03 19:58:40,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt. [default2]:[2022-09-03 19:58:40,883] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 282 [default3]:[2022-09-03 19:58:41,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt. [default3]:[2022-09-03 19:58:41,147] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 283 [default0]:[2022-09-03 19:58:41,312] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 280 [default5]:[2022-09-03 19:58:41,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt. [default5]:[2022-09-03 19:58:41,617] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 221 [default2]:[2022-09-03 19:58:41,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt. [default2]:[2022-09-03 19:58:41,719] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 194 [default2]:[2022-09-03 19:58:41,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt. [default2]:[2022-09-03 19:58:41,748] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 202 [default1]:[2022-09-03 19:58:41,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt. [default1]:[2022-09-03 19:58:41,864] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 73 [default0]:[2022-09-03 19:58:41,984] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 216 [default1]:[2022-09-03 19:58:41,985] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 281 [default2]:[2022-09-03 19:58:42,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt. [default3]:[2022-09-03 19:58:42,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt. [default0]:[2022-09-03 19:58:42,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt. [default1]:[2022-09-03 19:58:42,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/layer_01-model_00-model_states.pt. [default3]:[2022-09-03 19:58:42,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt. [default3]:[2022-09-03 19:58:42,147] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 155 [default4]:[2022-09-03 19:58:42,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt. [default4]:[2022-09-03 19:58:42,200] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 20 [default7]:[2022-09-03 19:58:42,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt. [default7]:[2022-09-03 19:58:42,250] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 263 [default2]:[2022-09-03 19:58:42,314] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 282 [default6]:[2022-09-03 19:58:42,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt. [default6]:[2022-09-03 19:58:42,364] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 254 [default3]:[2022-09-03 19:58:42,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt. [default3]:[2022-09-03 19:58:42,379] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 75 [default3]:[2022-09-03 19:58:42,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt. [default3]:[2022-09-03 19:58:42,481] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 251 [default3]:[2022-09-03 19:58:42,486] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 283 [default2]:[2022-09-03 19:58:42,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt. [default2]:[2022-09-03 19:58:42,700] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 74 [default0]:[2022-09-03 19:58:42,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt. [default0]:[2022-09-03 19:58:42,802] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 192 [default0]:[2022-09-03 19:58:42,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt. [default0]:[2022-09-03 19:58:42,780] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 72 [default5]:[2022-09-03 19:58:42,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt. [default5]:[2022-09-03 19:58:42,836] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 205 [default2]:[2022-09-03 19:58:42,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt. [default2]:[2022-09-03 19:58:42,922] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 266 [default2]:[2022-09-03 19:58:42,992] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 202 [default1]:[2022-09-03 19:58:43,068] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 73 [default2]:[2022-09-03 19:58:43,185] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 194 [default0]:[2022-09-03 19:58:43,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt. [default0]:[2022-09-03 19:58:43,238] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 200 [default7]:[2022-09-03 19:58:43,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt. [default7]:[2022-09-03 19:58:43,190] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 207 [default3]:[2022-09-03 19:58:43,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt. [default3]:[2022-09-03 19:58:43,261] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 195 [default5]:[2022-09-03 19:58:43,287] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 221 [default4]:[2022-09-03 19:58:43,336] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 20 [default1]:[2022-09-03 19:58:43,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt. [default1]:[2022-09-03 19:58:43,376] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 193 [default7]:[2022-09-03 19:58:43,389] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 263 [default1]:[2022-09-03 19:58:43,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt. [default1]:[2022-09-03 19:58:43,613] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 201 [default2]:[2022-09-03 19:58:43,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt. [default2]:[2022-09-03 19:58:43,716] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 154 [default3]:[2022-09-03 19:58:43,706] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 155 [default3]:[2022-09-03 19:58:43,749] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 251 [default0]: > using checkpoint value 2e-05 for learning rate [default0]: > using checkpoint value 0.0 for minimum learning rate [default0]: > using checkpoint value 0 for warmup iterations [default0]: > using checkpoint value 6348800 for total number of iterations [default0]: > using checkpoint value constant for decay style [default0]:[2022-09-03 19:58:43,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [default1]:[2022-09-03 19:58:43,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt. [default1]:[2022-09-03 19:58:43,740] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 153 [default5]:[2022-09-03 19:58:43,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt... [default4]:[2022-09-03 19:58:43,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt... [default6]:[2022-09-03 19:58:43,835] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 254 [default1]:[2022-09-03 19:58:43,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... [default0]:[2022-09-03 19:58:43,843] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 192 [default5]:[2022-09-03 19:58:44,016] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 205 [default2]:[2022-09-03 19:58:44,157] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 266 [default0]:[2022-09-03 19:58:44,303] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 72 [default0]:[2022-09-03 19:58:44,340] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 200 [default2]:[2022-09-03 19:58:44,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... [default6]:[2022-09-03 19:58:44,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt... [default6]:[2022-09-03 19:58:44,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt. [default6]:[2022-09-03 19:58:44,386] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 198 [default2]:[2022-09-03 19:58:44,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt. [default2]:[2022-09-03 19:58:44,431] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 162 [default4]:[2022-09-03 19:58:44,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt. [default4]:[2022-09-03 19:58:44,376] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 220 [default7]:[2022-09-03 19:58:44,403] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 207 [default3]:[2022-09-03 19:58:44,506] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 195 [default2]:[2022-09-03 19:58:44,499] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 74 [default3]:[2022-09-03 19:58:44,514] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 75 [default7]:[2022-09-03 19:58:44,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt... [default3]:[2022-09-03 19:58:44,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... [default6]:[2022-09-03 19:58:44,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt. [default6]:[2022-09-03 19:58:44,558] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 222 [default1]:[2022-09-03 19:58:44,732] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 201 [default1]:[2022-09-03 19:58:44,779] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 193 [default1]:[2022-09-03 19:58:44,837] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 153 [default2]:[2022-09-03 19:58:44,992] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 154 [default7]:[2022-09-03 19:58:45,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt. [default7]:[2022-09-03 19:58:45,004] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 199 [default0]:[2022-09-03 19:58:45,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt. [default0]:[2022-09-03 19:58:45,021] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 264 [default6]:[2022-09-03 19:58:45,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt. [default6]:[2022-09-03 19:58:45,032] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 206 [default7]:[2022-09-03 19:58:45,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt. [default7]:[2022-09-03 19:58:45,051] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 255 [default2]:[2022-09-03 19:58:45,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt. [default2]:[2022-09-03 19:58:45,055] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 18 [default2]:[2022-09-03 19:58:45,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt. [default2]:[2022-09-03 19:58:45,207] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 250 [default0]:[2022-09-03 19:58:45,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt. [default0]:[2022-09-03 19:58:45,201] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 152 [default0]:[2022-09-03 19:58:45,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt. [default0]:[2022-09-03 19:58:45,197] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 272 [default1]:[2022-09-03 19:58:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt. [default1]:[2022-09-03 19:58:45,333] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 273 [default3]:[2022-09-03 19:58:45,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt. [default3]:[2022-09-03 19:58:45,289] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 275 [default1]:[2022-09-03 19:58:45,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt. [default1]:[2022-09-03 19:58:45,433] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 169 [default7]:[2022-09-03 19:58:45,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt. [default7]:[2022-09-03 19:58:45,387] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 79 [default1]:[2022-09-03 19:58:45,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt. [default1]:[2022-09-03 19:58:45,394] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 17 [default6]:[2022-09-03 19:58:45,442] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 198 [default5]:[2022-09-03 19:58:45,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt. [default5]:[2022-09-03 19:58:45,513] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 197 [default6]:[2022-09-03 19:58:45,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt. [default6]:[2022-09-03 19:58:45,491] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 182 [default2]:[2022-09-03 19:58:45,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt. [default2]:[2022-09-03 19:58:45,466] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 274 [default3]:[2022-09-03 19:58:45,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt. [default3]:[2022-09-03 19:58:45,510] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 203 [default4]:[2022-09-03 19:58:45,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt. [default4]:[2022-09-03 19:58:45,565] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 12 [default2]:[2022-09-03 19:58:45,548] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 162 [default4]:[2022-09-03 19:58:45,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt. [default4]:[2022-09-03 19:58:45,577] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 204 [default4]:[2022-09-03 19:58:45,574] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 220 [default3]:[2022-09-03 19:58:45,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt. [default3]:[2022-09-03 19:58:45,701] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 11 [default6]:[2022-09-03 19:58:45,744] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 222 [default1]:[2022-09-03 19:58:45,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt. [default1]:[2022-09-03 19:58:45,713] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 265 [default3]:[2022-09-03 19:58:45,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt. [default3]:[2022-09-03 19:58:45,668] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 267 [default2]:[2022-09-03 19:58:45,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt. [default2]:[2022-09-03 19:58:45,828] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 10 [default0]:[2022-09-03 19:58:45,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt. [default0]:[2022-09-03 19:58:45,783] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 64 [default4]:[2022-09-03 19:58:45,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt. [default4]:[2022-09-03 19:58:45,837] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 180 [default3]:[2022-09-03 19:58:45,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt. [default3]:[2022-09-03 19:58:45,785] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 163 [default0]:[2022-09-03 19:58:45,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt. [default0]:[2022-09-03 19:58:45,805] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 16 [default7]:[2022-09-03 19:58:45,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt. [default7]:[2022-09-03 19:58:45,835] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 23 [default1]:[2022-09-03 19:58:45,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt. [default1]:[2022-09-03 19:58:45,852] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 249 [default6]:[2022-09-03 19:58:45,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt. [default6]:[2022-09-03 19:58:45,933] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 22 [default7]:[2022-09-03 19:58:45,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt. [default7]:[2022-09-03 19:58:45,945] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 279 [default7]:[2022-09-03 19:58:46,061] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 199 [default3]:[2022-09-03 19:58:46,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt. [default3]:[2022-09-03 19:58:46,056] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 35 [default0]:[2022-09-03 19:58:46,085] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 264 [default7]:[2022-09-03 19:58:46,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt. [default7]:[2022-09-03 19:58:46,069] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 271 [default7]:[2022-09-03 19:58:46,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt. [default7]:[2022-09-03 19:58:46,161] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 223 [default6]:[2022-09-03 19:58:46,199] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 206 [default2]:[2022-09-03 19:58:46,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt. [default2]:[2022-09-03 19:58:46,264] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 170 [default4]:[2022-09-03 19:58:46,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt. [default4]:[2022-09-03 19:58:46,330] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 276 [default2]:[2022-09-03 19:58:46,283] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 18 [default7]:[2022-09-03 19:58:46,360] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 255 [default5]:[2022-09-03 19:58:46,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt. [default5]:[2022-09-03 19:58:46,385] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 13 [default0]:[2022-09-03 19:58:46,524] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 152 [default4]:[2022-09-03 19:58:46,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt. [default4]:[2022-09-03 19:58:46,516] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 196 [default1]:[2022-09-03 19:58:46,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt. [default1]:[2022-09-03 19:58:46,525] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 217 [default2]:[2022-09-03 19:58:46,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt. [default2]:[2022-09-03 19:58:46,502] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 178 [default7]:[2022-09-03 19:58:46,485] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 79 [default4]:[2022-09-03 19:58:46,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt. [default4]:[2022-09-03 19:58:46,484] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 76 [default5]:[2022-09-03 19:58:46,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt. [default5]:[2022-09-03 19:58:46,504] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 37 [default0]:[2022-09-03 19:58:46,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt. [default0]:[2022-09-03 19:58:46,546] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 48 [default1]:[2022-09-03 19:58:46,563] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 169 [default5]:[2022-09-03 19:58:46,622] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 197 [default3]:[2022-09-03 19:58:46,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt. [default3]:[2022-09-03 19:58:46,584] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 67 [default6]:[2022-09-03 19:58:46,614] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 182 [default6]:[2022-09-03 19:58:46,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt. [default6]:[2022-09-03 19:58:46,644] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 78 [default6]:[2022-09-03 19:58:46,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt. [default6]:[2022-09-03 19:58:46,629] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 158 [default4]:[2022-09-03 19:58:46,646] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 204 [default4]:[2022-09-03 19:58:46,730] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 12 [default7]:[2022-09-03 19:58:46,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt. [default7]:[2022-09-03 19:58:46,713] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 15 [default0]:[2022-09-03 19:58:46,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt. [default0]:[2022-09-03 19:58:46,716] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 8 [default1]:[2022-09-03 19:58:46,736] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 17 [default4]:[2022-09-03 19:58:46,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt. [default4]:[2022-09-03 19:58:46,830] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 156 [default2]:[2022-09-03 19:58:46,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt. [default2]:[2022-09-03 19:58:46,815] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 218 [default1]:[2022-09-03 19:58:46,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt. [default1]:[2022-09-03 19:58:46,756] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 161 [default0]:[2022-09-03 19:58:46,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt. [default0]:[2022-09-03 19:58:46,778] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 32 [default2]:[2022-09-03 19:58:46,908] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 250 [default3]:[2022-09-03 19:58:46,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt. [default3]:[2022-09-03 19:58:46,918] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 115 [default7]:[2022-09-03 19:58:46,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt. [default7]:[2022-09-03 19:58:46,850] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 183 [default5]:[2022-09-03 19:58:46,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt. [default5]:[2022-09-03 19:58:46,943] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 157 [default1]:[2022-09-03 19:58:46,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt. [default1]:[2022-09-03 19:58:46,870] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 33 [default3]:[2022-09-03 19:58:46,927] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 275 [default3]:[2022-09-03 19:58:46,937] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 203 [default4]:[2022-09-03 19:58:47,004] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 180 [default0]:[2022-09-03 19:58:46,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt. [default0]:[2022-09-03 19:58:46,993] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 160 [default6]:[2022-09-03 19:58:47,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt. [default6]:[2022-09-03 19:58:47,027] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 278 [default1]:[2022-09-03 19:58:47,002] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 265 [default1]:[2022-09-03 19:58:47,071] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 249 [default0]:[2022-09-03 19:58:47,071] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 272 [default0]:[2022-09-03 19:58:47,048] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 64 [default3]:[2022-09-03 19:58:47,064] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 163 [default6]:[2022-09-03 19:58:47,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt. [default6]:[2022-09-03 19:58:47,092] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 38 [default4]:[2022-09-03 19:58:47,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt. [default4]:[2022-09-03 19:58:47,075] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 268 [default2]:[2022-09-03 19:58:47,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt. [default2]:[2022-09-03 19:58:47,155] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 106 [default0]:[2022-09-03 19:58:47,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt. [default0]:[2022-09-03 19:58:47,156] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 176 [default6]:[2022-09-03 19:58:47,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt. [default6]:[2022-09-03 19:58:47,193] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 110 [default1]:[2022-09-03 19:58:47,188] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 273 [default0]:[2022-09-03 19:58:47,207] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 16 [default3]:[2022-09-03 19:58:47,167] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 267 [default2]:[2022-09-03 19:58:47,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt. [default2]:[2022-09-03 19:58:47,334] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 226 [default3]:[2022-09-03 19:58:47,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt. [default3]:[2022-09-03 19:58:47,275] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 219 [default6]:[2022-09-03 19:58:47,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt. [default6]:[2022-09-03 19:58:47,298] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 46 [default5]:[2022-09-03 19:58:47,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt. [default5]:[2022-09-03 19:58:47,425] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 213 [default5]:[2022-09-03 19:58:47,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt. [default5]:[2022-09-03 19:58:47,422] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 181 [default3]:[2022-09-03 19:58:47,405] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 11 [default7]:[2022-09-03 19:58:47,380] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 223 [default3]:[2022-09-03 19:58:47,376] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 35 [default2]:[2022-09-03 19:58:47,418] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 274 [default5]:[2022-09-03 19:58:47,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt. [default5]:[2022-09-03 19:58:47,359] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 277 [default7]:[2022-09-03 19:58:47,432] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 271 [default2]:[2022-09-03 19:58:47,488] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 170 [default0]:[2022-09-03 19:58:47,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt. [default0]:[2022-09-03 19:58:47,445] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 40 [default0]:[2022-09-03 19:58:47,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt. [default0]:[2022-09-03 19:58:47,454] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 208 [default2]:[2022-09-03 19:58:47,533] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 10 [default2]:[2022-09-03 19:58:47,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt. [default2]:[2022-09-03 19:58:47,527] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 138 [default7]:[2022-09-03 19:58:47,467] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 279 [default4]:[2022-09-03 19:58:47,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt. [default4]:[2022-09-03 19:58:47,503] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 44 [default6]:[2022-09-03 19:58:47,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt. [default6]:[2022-09-03 19:58:47,570] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 94 [default4]:[2022-09-03 19:58:47,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt. [default4]:[2022-09-03 19:58:47,541] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 212 [default1]:[2022-09-03 19:58:47,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt. [default1]:[2022-09-03 19:58:47,569] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 209 [default0]:[2022-09-03 19:58:47,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt. [default0]:[2022-09-03 19:58:47,589] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 136 [default4]:[2022-09-03 19:58:47,608] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 196 [default1]:[2022-09-03 19:58:47,608] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 217 [default1]:[2022-09-03 19:58:47,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt. [default1]:[2022-09-03 19:58:47,579] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 257 [default6]:[2022-09-03 19:58:47,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt. [default6]:[2022-09-03 19:58:47,637] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 270 [default2]:[2022-09-03 19:58:47,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt. [default2]:[2022-09-03 19:58:47,732] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 82 [default0]:[2022-09-03 19:58:47,711] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 48 [default1]:[2022-09-03 19:58:47,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt. [default1]:[2022-09-03 19:58:47,664] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 105 [default6]:[2022-09-03 19:58:47,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt. [default6]:[2022-09-03 19:58:47,669] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 214 [default7]:[2022-09-03 19:58:47,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt. [default7]:[2022-09-03 19:58:47,687] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 215 [default1]:[2022-09-03 19:58:47,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt. [default1]:[2022-09-03 19:58:47,652] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 65 [default2]:[2022-09-03 19:58:47,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt. [default2]:[2022-09-03 19:58:47,672] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 66 [default6]:[2022-09-03 19:58:47,715] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 158 [default3]:[2022-09-03 19:58:47,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt. [default3]:[2022-09-03 19:58:47,692] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 19 [default5]:[2022-09-03 19:58:47,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt. [default5]:[2022-09-03 19:58:47,741] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 261 [default1]:[2022-09-03 19:58:47,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt. [default1]:[2022-09-03 19:58:47,698] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 145 [default4]:[2022-09-03 19:58:47,684] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 276 [default4]:[2022-09-03 19:58:47,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt. [default4]:[2022-09-03 19:58:47,742] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 52 [default6]:[2022-09-03 19:58:47,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt. [default6]:[2022-09-03 19:58:47,763] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 14 [default2]:[2022-09-03 19:58:47,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt. [default2]:[2022-09-03 19:58:47,792] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 42 [default2]:[2022-09-03 19:58:47,764] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 178 [default4]:[2022-09-03 19:58:47,804] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 76 [default3]:[2022-09-03 19:58:47,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt. [default3]:[2022-09-03 19:58:47,840] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 43 [default3]:[2022-09-03 19:58:47,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt. [default3]:[2022-09-03 19:58:47,822] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 235 [default5]:[2022-09-03 19:58:47,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt. [default5]:[2022-09-03 19:58:47,768] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 77 [default6]:[2022-09-03 19:58:47,818] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 22 [default5]:[2022-09-03 19:58:47,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt. [default5]:[2022-09-03 19:58:47,824] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 21 [default7]:[2022-09-03 19:58:47,773] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 23 [default5]:[2022-09-03 19:58:47,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt. [default5]:[2022-09-03 19:58:47,834] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 269 [default3]:[2022-09-03 19:58:47,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt. [default3]:[2022-09-03 19:58:47,919] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 179 [default5]:[2022-09-03 19:58:47,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt. [default5]:[2022-09-03 19:58:47,855] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 69 [default1]:[2022-09-03 19:58:47,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt. [default1]:[2022-09-03 19:58:47,878] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 177 [default6]:[2022-09-03 19:58:47,881] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 78 [default0]:[2022-09-03 19:58:47,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt. [default0]:[2022-09-03 19:58:47,887] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 232 [default5]:[2022-09-03 19:58:47,880] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 37 [default7]:[2022-09-03 19:58:47,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt. [default7]:[2022-09-03 19:58:47,902] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 247 [default2]:[2022-09-03 19:58:47,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt. [default2]:[2022-09-03 19:58:47,946] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 258 [default0]:[2022-09-03 19:58:47,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt. [default0]:[2022-09-03 19:58:47,967] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 224 [default3]:[2022-09-03 19:58:47,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt. [default3]:[2022-09-03 19:58:47,972] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 227 [default7]:[2022-09-03 19:58:47,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt. [default7]:[2022-09-03 19:58:47,996] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 39 [default6]:[2022-09-03 19:58:47,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt. [default6]:[2022-09-03 19:58:47,969] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 262 [default6]:[2022-09-03 19:58:48,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt. [default6]:[2022-09-03 19:58:48,087] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 54 [default3]:[2022-09-03 19:58:48,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt. [default3]:[2022-09-03 19:58:48,124] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 211 [default3]:[2022-09-03 19:58:48,066] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 67 [default0]:[2022-09-03 19:58:48,068] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 8 [default1]:[2022-09-03 19:58:48,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt. [default1]:[2022-09-03 19:58:48,097] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 233 [default6]:[2022-09-03 19:58:48,115] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 278 [default3]:[2022-09-03 19:58:48,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt. [default3]:[2022-09-03 19:58:48,151] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 83 [default4]:[2022-09-03 19:58:48,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt. [default4]:[2022-09-03 19:58:48,156] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 252 [default2]:[2022-09-03 19:58:48,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt. [default2]:[2022-09-03 19:58:48,190] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 50 [default5]:[2022-09-03 19:58:48,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt. [default5]:[2022-09-03 19:58:48,160] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 53 [default3]:[2022-09-03 19:58:48,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt. [default3]:[2022-09-03 19:58:48,164] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 107 [default3]:[2022-09-03 19:58:48,162] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 115 [default1]:[2022-09-03 19:58:48,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt. [default1]:[2022-09-03 19:58:48,177] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 41 [default6]:[2022-09-03 19:58:48,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt. [default6]:[2022-09-03 19:58:48,148] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 174 [default1]:[2022-09-03 19:58:48,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt. [default1]:[2022-09-03 19:58:48,165] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 9 [default5]:[2022-09-03 19:58:48,240] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 13 [default4]:[2022-09-03 19:58:48,180] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 156 [default1]:[2022-09-03 19:58:48,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt. [default1]:[2022-09-03 19:58:48,227] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 225 [default0]:[2022-09-03 19:58:48,229] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 32 [default7]:[2022-09-03 19:58:48,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt. [default7]:[2022-09-03 19:58:48,202] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 239 [default4]:[2022-09-03 19:58:48,179] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 268 [default5]:[2022-09-03 19:58:48,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt. [default5]:[2022-09-03 19:58:48,290] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 253 [default5]:[2022-09-03 19:58:48,332] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 157 [default2]:[2022-09-03 19:58:48,287] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 218 [default1]:[2022-09-03 19:58:48,276] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 33 [default7]:[2022-09-03 19:58:48,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt. [default7]:[2022-09-03 19:58:48,275] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 47 [default0]:[2022-09-03 19:58:48,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt. [default0]:[2022-09-03 19:58:48,405] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 168 [default7]:[2022-09-03 19:58:48,389] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 15 [default7]:[2022-09-03 19:58:48,425] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 183 [default7]:[2022-09-03 19:58:48,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt. [default7]:[2022-09-03 19:58:48,402] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 159 [default1]:[2022-09-03 19:58:48,445] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 161 [default6]:[2022-09-03 19:58:48,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt. [default6]:[2022-09-03 19:58:48,357] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 166 [default7]:[2022-09-03 19:58:48,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt. [default7]:[2022-09-03 19:58:48,439] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 167 [default6]:[2022-09-03 19:58:48,393] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 38 [default5]:[2022-09-03 19:58:48,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt. [default5]:[2022-09-03 19:58:48,419] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 237 [default5]:[2022-09-03 19:58:48,438] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 277 [default0]:[2022-09-03 19:58:48,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt. [default0]:[2022-09-03 19:58:48,505] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 248 [default1]:[2022-09-03 19:58:48,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt. [default1]:[2022-09-03 19:58:48,533] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 49 [default1]:[2022-09-03 19:58:48,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt. [default1]:[2022-09-03 19:58:48,456] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 113 [default2]:[2022-09-03 19:58:48,518] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 226 [default3]:[2022-09-03 19:58:48,500] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 219 [default0]:[2022-09-03 19:58:48,519] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 160 [default2]:[2022-09-03 19:58:48,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt. [default2]:[2022-09-03 19:58:48,481] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 146 [default2]:[2022-09-03 19:58:48,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt. [default2]:[2022-09-03 19:58:48,486] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 234 [default4]:[2022-09-03 19:58:48,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt. [default4]:[2022-09-03 19:58:48,483] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 36 [default5]:[2022-09-03 19:58:48,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt. [default5]:[2022-09-03 19:58:48,506] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 45 [default7]:[2022-09-03 19:58:48,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt. [default7]:[2022-09-03 19:58:48,597] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 55 [default2]:[2022-09-03 19:58:48,627] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 106 [default3]:[2022-09-03 19:58:48,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt. [default3]:[2022-09-03 19:58:48,595] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 171 [default7]:[2022-09-03 19:58:48,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt. [default7]:[2022-09-03 19:58:48,605] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 111 [default0]:[2022-09-03 19:58:48,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt. [default0]:[2022-09-03 19:58:48,545] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 112 [default2]:[2022-09-03 19:58:48,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt. [default2]:[2022-09-03 19:58:48,602] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 114 [default5]:[2022-09-03 19:58:48,611] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 213 [default7]:[2022-09-03 19:58:48,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt. [default7]:[2022-09-03 19:58:48,556] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 175 [default1]:[2022-09-03 19:58:48,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt. [default1]:[2022-09-03 19:58:48,582] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 137 [default0]:[2022-09-03 19:58:48,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt. [default0]:[2022-09-03 19:58:48,612] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 144 [default6]:[2022-09-03 19:58:48,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt. [default6]:[2022-09-03 19:58:48,573] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 230 [default3]:[2022-09-03 19:58:48,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt. [default3]:[2022-09-03 19:58:48,631] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 147 [default4]:[2022-09-03 19:58:48,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt. [default4]:[2022-09-03 19:58:48,595] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 260 [default3]:[2022-09-03 19:58:48,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt. [default3]:[2022-09-03 19:58:48,684] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 51 [default0]:[2022-09-03 19:58:48,714] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 176 [default6]:[2022-09-03 19:58:48,715] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 110 [default7]:[2022-09-03 19:58:48,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt. [default7]:[2022-09-03 19:58:48,720] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 231 [default6]:[2022-09-03 19:58:48,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt. [default6]:[2022-09-03 19:58:48,742] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 190 [default4]:[2022-09-03 19:58:48,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt. [default4]:[2022-09-03 19:58:48,682] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 236 [default0]:[2022-09-03 19:58:48,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt. [default0]:[2022-09-03 19:58:48,711] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 256 [default6]:[2022-09-03 19:58:48,735] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 94 [default7]:[2022-09-03 19:58:48,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt. [default7]:[2022-09-03 19:58:48,783] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 95 [default4]:[2022-09-03 19:58:48,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt. [default4]:[2022-09-03 19:58:48,811] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 172 [default0]:[2022-09-03 19:58:48,804] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 208 [default5]:[2022-09-03 19:58:48,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt. [default5]:[2022-09-03 19:58:48,742] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 173 [default2]:[2022-09-03 19:58:48,752] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 138 [default1]:[2022-09-03 19:58:48,840] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 65 [default2]:[2022-09-03 19:58:48,814] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 66 [default4]:[2022-09-03 19:58:48,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt. [default4]:[2022-09-03 19:58:48,785] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 228 [default5]:[2022-09-03 19:58:48,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt. [default5]:[2022-09-03 19:58:48,807] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 229 [default5]:[2022-09-03 19:58:48,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt. [default5]:[2022-09-03 19:58:48,766] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 165 [default2]:[2022-09-03 19:58:48,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt. [default2]:[2022-09-03 19:58:48,826] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 34 [default3]:[2022-09-03 19:58:48,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt. [default3]:[2022-09-03 19:58:48,827] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 259 [default6]:[2022-09-03 19:58:48,806] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 270 [default7]:[2022-09-03 19:58:48,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt. [default7]:[2022-09-03 19:58:48,884] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 87 [default1]:[2022-09-03 19:58:48,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt. [default1]:[2022-09-03 19:58:48,933] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 81 [default1]:[2022-09-03 19:58:48,935] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 105 [default0]:[2022-09-03 19:58:48,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt. [default0]:[2022-09-03 19:58:48,895] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 104 [default2]:[2022-09-03 19:58:48,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt. [default2]:[2022-09-03 19:58:48,910] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 210 [default1]:[2022-09-03 19:58:48,867] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 209 [default0]:[2022-09-03 19:58:48,938] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 136 [default6]:[2022-09-03 19:58:48,932] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 14 [default7]:[2022-09-03 19:58:48,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt. [default7]:[2022-09-03 19:58:48,906] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 71 [default6]:[2022-09-03 19:58:48,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt. [default6]:[2022-09-03 19:58:48,925] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 70 [default4]:[2022-09-03 19:58:48,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt. [default4]:[2022-09-03 19:58:48,913] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 164 [default5]:[2022-09-03 19:58:48,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt. [default5]:[2022-09-03 19:58:48,896] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 149 [default5]:[2022-09-03 19:58:48,866] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 77 [default1]:[2022-09-03 19:58:48,931] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 145 [default1]:[2022-09-03 19:58:48,922] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 257 [default7]:[2022-09-03 19:58:48,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt. [default7]:[2022-09-03 19:58:48,917] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 63 [default4]:[2022-09-03 19:58:48,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt. [default4]:[2022-09-03 19:58:48,988] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 92 [default2]:[2022-09-03 19:58:49,006] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 82 [default4]:[2022-09-03 19:58:49,015] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 52 [default4]:[2022-09-03 19:58:48,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt. [default4]:[2022-09-03 19:58:48,945] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 108 [default5]:[2022-09-03 19:58:48,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt. [default5]:[2022-09-03 19:58:48,950] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 109 [default4]:[2022-09-03 19:58:48,973] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 212 [default6]:[2022-09-03 19:58:48,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt. [default6]:[2022-09-03 19:58:48,987] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 118 [default7]:[2022-09-03 19:58:48,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt. [default7]:[2022-09-03 19:58:48,964] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 119 [default6]:[2022-09-03 19:58:48,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt. [default6]:[2022-09-03 19:58:48,955] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 246 [default2]:[2022-09-03 19:58:48,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt. [default2]:[2022-09-03 19:58:48,954] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 130 [default3]:[2022-09-03 19:58:49,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt. [default3]:[2022-09-03 19:58:49,052] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 243 [default5]:[2022-09-03 19:58:49,061] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 269 [default5]:[2022-09-03 19:58:49,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt. [default5]:[2022-09-03 19:58:49,117] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 93 [default5]:[2022-09-03 19:58:49,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt. [default5]:[2022-09-03 19:58:49,108] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 85 [default4]:[2022-09-03 19:58:49,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt. [default4]:[2022-09-03 19:58:49,142] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 116 [default5]:[2022-09-03 19:58:49,134] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 69 [default7]:[2022-09-03 19:58:49,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt. [default7]:[2022-09-03 19:58:49,067] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 127 [default0]:[2022-09-03 19:58:49,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt. [default0]:[2022-09-03 19:58:49,054] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 56 [default5]:[2022-09-03 19:58:49,134] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 21 [default6]:[2022-09-03 19:58:49,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt. [default6]:[2022-09-03 19:58:49,058] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 142 [default3]:[2022-09-03 19:58:49,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt. [default3]:[2022-09-03 19:58:49,096] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 139 [default0]:[2022-09-03 19:58:49,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt. [default0]:[2022-09-03 19:58:49,189] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 80 [default2]:[2022-09-03 19:58:49,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt. [default2]:[2022-09-03 19:58:49,181] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 122 [default7]:[2022-09-03 19:58:49,188] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 215 [default3]:[2022-09-03 19:58:49,240] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 19 [default7]:[2022-09-03 19:58:49,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt. [default7]:[2022-09-03 19:58:49,208] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 151 [default5]:[2022-09-03 19:58:49,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt. [default5]:[2022-09-03 19:58:49,179] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 189 [default2]:[2022-09-03 19:58:49,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt. [default2]:[2022-09-03 19:58:49,156] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 242 [default2]:[2022-09-03 19:58:49,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt. [default2]:[2022-09-03 19:58:49,157] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 58 [default6]:[2022-09-03 19:58:49,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt. [default6]:[2022-09-03 19:58:49,191] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 238 [default6]:[2022-09-03 19:58:49,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt. [default6]:[2022-09-03 19:58:49,281] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 86 [default6]:[2022-09-03 19:58:49,269] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 214 [default5]:[2022-09-03 19:58:49,284] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 181 [default5]:[2022-09-03 19:58:49,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt. [default5]:[2022-09-03 19:58:49,264] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 125 [default5]:[2022-09-03 19:58:49,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt. [default5]:[2022-09-03 19:58:49,285] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 117 [default4]:[2022-09-03 19:58:49,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt. [default4]:[2022-09-03 19:58:49,296] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 68 [default1]:[2022-09-03 19:58:49,320] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 177 [default3]:[2022-09-03 19:58:49,315] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 227 [default2]:[2022-09-03 19:58:49,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt. [default2]:[2022-09-03 19:58:49,299] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 186 [default3]:[2022-09-03 19:58:49,326] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 235 [default6]:[2022-09-03 19:58:49,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt. [default6]:[2022-09-03 19:58:49,287] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 150 [default7]:[2022-09-03 19:58:49,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt. [default7]:[2022-09-03 19:58:49,341] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 191 [default7]:[2022-09-03 19:58:49,256] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 247 [default7]:[2022-09-03 19:58:49,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt. [default7]:[2022-09-03 19:58:49,296] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 143 [default5]:[2022-09-03 19:58:49,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt. [default5]:[2022-09-03 19:58:49,277] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 141 [default1]:[2022-09-03 19:58:49,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt. [default1]:[2022-09-03 19:58:49,276] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 241 [default5]:[2022-09-03 19:58:49,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt. [default5]:[2022-09-03 19:58:49,309] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 61 [default5]:[2022-09-03 19:58:49,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt. [default5]:[2022-09-03 19:58:49,342] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 245 [default1]:[2022-09-03 19:58:49,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt. [default1]:[2022-09-03 19:58:49,415] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 89 [default1]:[2022-09-03 19:58:49,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt. [default1]:[2022-09-03 19:58:49,395] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 121 [default4]:[2022-09-03 19:58:49,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt. [default4]:[2022-09-03 19:58:49,412] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 124 [default4]:[2022-09-03 19:58:49,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt. [default4]:[2022-09-03 19:58:49,439] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 84 [default7]:[2022-09-03 19:58:49,447] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 159 [default4]:[2022-09-03 19:58:49,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt. [default4]:[2022-09-03 19:58:49,443] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 244 [default4]:[2022-09-03 19:58:49,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt. [default4]:[2022-09-03 19:58:49,366] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 140 [default4]:[2022-09-03 19:58:49,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt. [default4]:[2022-09-03 19:58:49,360] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 148 [default6]:[2022-09-03 19:58:49,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt. [default6]:[2022-09-03 19:58:49,444] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 62 [default3]:[2022-09-03 19:58:49,488] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 83 [default0]:[2022-09-03 19:58:49,523] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 40 [default4]:[2022-09-03 19:58:49,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt. [default4]:[2022-09-03 19:58:49,524] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 188 [default0]:[2022-09-03 19:58:49,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt. [default0]:[2022-09-03 19:58:49,531] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 128 [default4]:[2022-09-03 19:58:49,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt. [default4]:[2022-09-03 19:58:49,553] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 60 [default0]:[2022-09-03 19:58:49,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt. [default0]:[2022-09-03 19:58:49,595] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 88 [default3]:[2022-09-03 19:58:49,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt. [default3]:[2022-09-03 19:58:49,593] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 91 [default0]:[2022-09-03 19:58:49,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt. [default0]:[2022-09-03 19:58:49,550] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 120 [default3]:[2022-09-03 19:58:49,587] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 211 [default6]:[2022-09-03 19:58:49,583] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 174 [default1]:[2022-09-03 19:58:49,615] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 137 [default1]:[2022-09-03 19:58:49,585] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 9 [default4]:[2022-09-03 19:58:49,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt. [default4]:[2022-09-03 19:58:49,621] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 100 [default0]:[2022-09-03 19:58:49,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt. [default0]:[2022-09-03 19:58:49,607] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 240 [default3]:[2022-09-03 19:58:49,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt. [default3]:[2022-09-03 19:58:49,628] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 131 [default5]:[2022-09-03 19:58:49,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt. [default5]:[2022-09-03 19:58:49,582] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 133 [default7]:[2022-09-03 19:58:49,636] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 39 [default1]:[2022-09-03 19:58:49,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt. [default1]:[2022-09-03 19:58:49,620] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 185 [default6]:[2022-09-03 19:58:49,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt. [default6]:[2022-09-03 19:58:49,628] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 134 [default1]:[2022-09-03 19:58:49,686] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 49 [default3]:[2022-09-03 19:58:49,693] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 171 [default0]:[2022-09-03 19:58:49,688] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 224 [default1]:[2022-09-03 19:58:49,691] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 225 [default1]:[2022-09-03 19:58:49,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt. [default1]:[2022-09-03 19:58:49,657] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 57 [default2]:[2022-09-03 19:58:49,701] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 258 [default0]:[2022-09-03 19:58:49,790] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 248 [default2]:[2022-09-03 19:58:49,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt. [default2]:[2022-09-03 19:58:49,765] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 90 [default4]:[2022-09-03 19:58:49,811] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 252 [default0]:[2022-09-03 19:58:49,816] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 168 [default6]:[2022-09-03 19:58:49,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt. [default6]:[2022-09-03 19:58:49,808] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 126 [default0]:[2022-09-03 19:58:49,791] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 232 [default0]:[2022-09-03 19:58:49,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt. [default0]:[2022-09-03 19:58:49,824] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 184 [default7]:[2022-09-03 19:58:49,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt. [default7]:[2022-09-03 19:58:49,852] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 135 [default5]:[2022-09-03 19:58:49,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt. [default5]:[2022-09-03 19:58:49,837] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 5 [default3]:[2022-09-03 19:58:49,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt. [default3]:[2022-09-03 19:58:49,803] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 187 [default5]:[2022-09-03 19:58:49,924] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 253 [default2]:[2022-09-03 19:58:49,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt. [default2]:[2022-09-03 19:58:49,887] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 98 [default3]:[2022-09-03 19:58:49,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt. [default3]:[2022-09-03 19:58:49,863] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 123 [default1]:[2022-09-03 19:58:49,874] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 113 [default2]:[2022-09-03 19:58:49,881] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 42 [default6]:[2022-09-03 19:58:49,895] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 166 [default4]:[2022-09-03 19:58:49,861] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 36 [default1]:[2022-09-03 19:58:49,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt. [default1]:[2022-09-03 19:58:49,949] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 129 [default6]:[2022-09-03 19:58:49,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt. [default6]:[2022-09-03 19:58:49,910] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 6 [default7]:[2022-09-03 19:58:49,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt. [default7]:[2022-09-03 19:58:49,883] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 7 [default3]:[2022-09-03 19:58:49,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt. [default3]:[2022-09-03 19:58:49,873] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 59 [default2]:[2022-09-03 19:58:50,035] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 50 [default3]:[2022-09-03 19:58:50,000] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 179 [default6]:[2022-09-03 19:58:50,020] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 230 [default6]:[2022-09-03 19:58:49,995] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 190 [default3]:[2022-09-03 19:58:50,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt. [default3]:[2022-09-03 19:58:50,121] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 27 [default5]:[2022-09-03 19:58:50,059] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 53 [default6]:[2022-09-03 19:58:50,096] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 54 [default0]:[2022-09-03 19:58:50,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt. [default0]:[2022-09-03 19:58:50,058] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 24 [default4]:[2022-09-03 19:58:50,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt. [default4]:[2022-09-03 19:58:50,065] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 132 [default7]:[2022-09-03 19:58:50,079] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 231 [default4]:[2022-09-03 19:58:50,058] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 228 [default5]:[2022-09-03 19:58:50,074] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 229 [default2]:[2022-09-03 19:58:50,059] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 34 [default0]:[2022-09-03 19:58:50,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt. [default0]:[2022-09-03 19:58:50,191] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 96 [default5]:[2022-09-03 19:58:50,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt. [default5]:[2022-09-03 19:58:50,140] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 29 [default3]:[2022-09-03 19:58:50,223] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 107 [default1]:[2022-09-03 19:58:50,213] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 41 [default7]:[2022-09-03 19:58:50,157] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 175 [default3]:[2022-09-03 19:58:50,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt. [default3]:[2022-09-03 19:58:50,212] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 99 [default5]:[2022-09-03 19:58:50,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt. [default5]:[2022-09-03 19:58:50,202] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 101 [default6]:[2022-09-03 19:58:50,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt. [default6]:[2022-09-03 19:58:50,171] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 102 [default0]:[2022-09-03 19:58:50,165] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 144 [default1]:[2022-09-03 19:58:50,228] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 233 [default6]:[2022-09-03 19:58:50,220] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 46 [default2]:[2022-09-03 19:58:50,308] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 210 [default2]:[2022-09-03 19:58:50,322] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 114 [default1]:[2022-09-03 19:58:50,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt. [default1]:[2022-09-03 19:58:50,257] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 97 [default4]:[2022-09-03 19:58:50,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt. [default4]:[2022-09-03 19:58:50,393] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 28 [default7]:[2022-09-03 19:58:50,355] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 55 [default6]:[2022-09-03 19:58:50,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt. [default6]:[2022-09-03 19:58:50,409] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 30 [default2]:[2022-09-03 19:58:50,394] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 122 [default7]:[2022-09-03 19:58:50,344] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 111 [default7]:[2022-09-03 19:58:50,446] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 127 [default7]:[2022-09-03 19:58:50,352] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 167 [default5]:[2022-09-03 19:58:50,362] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 261 [default2]:[2022-09-03 19:58:50,410] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 146 [default0]:[2022-09-03 19:58:50,417] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 256 [default0]:[2022-09-03 19:58:50,472] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 104 [default2]:[2022-09-03 19:58:50,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt. [default2]:[2022-09-03 19:58:50,464] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 26 [default1]:[2022-09-03 19:58:50,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt. [default1]:[2022-09-03 19:58:50,463] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 25 [default0]:[2022-09-03 19:58:50,510] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 112 [default7]:[2022-09-03 19:58:50,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt. [default7]:[2022-09-03 19:58:50,526] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 103 [default7]:[2022-09-03 19:58:50,527] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 71 [default4]:[2022-09-03 19:58:50,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt. [default4]:[2022-09-03 19:58:50,471] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 4 [default3]:[2022-09-03 19:58:50,633] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 51 [default5]:[2022-09-03 19:58:50,544] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 125 [default6]:[2022-09-03 19:58:50,560] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 70 [default5]:[2022-09-03 19:58:50,593] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 165 [default3]:[2022-09-03 19:58:50,576] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 147 [default4]:[2022-09-03 19:58:50,679] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 172 [default7]:[2022-09-03 19:58:50,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt. [default7]:[2022-09-03 19:58:50,654] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 31 [default1]:[2022-09-03 19:58:50,653] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 121 [default4]:[2022-09-03 19:58:50,729] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 124 [default5]:[2022-09-03 19:58:50,665] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 173 [default4]:[2022-09-03 19:58:50,664] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 68 [default4]:[2022-09-03 19:58:50,669] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 164 [default3]:[2022-09-03 19:58:50,748] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 43 [default0]:[2022-09-03 19:58:50,738] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 80 [default4]:[2022-09-03 19:58:50,801] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 108 [default5]:[2022-09-03 19:58:50,811] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 109 [default4]:[2022-09-03 19:58:50,842] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 100 [default0]:[2022-09-03 19:58:50,789] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 56 [default6]:[2022-09-03 19:58:50,787] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 262 [default0]:[2022-09-03 19:58:50,864] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 120 [default2]:[2022-09-03 19:58:50,863] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 234 [default6]:[2022-09-03 19:58:50,996] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 126 [default7]:[2022-09-03 19:58:50,965] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 47 [default4]:[2022-09-03 19:58:51,017] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 260 [default0]:[2022-09-03 19:58:51,121] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 24 [default3]:[2022-09-03 19:58:51,119] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 123 [default4]:[2022-09-03 19:58:51,106] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 44 [default2]:[2022-09-03 19:58:51,094] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 186 [default7]:[2022-09-03 19:58:51,106] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 239 [default7]:[2022-09-03 19:58:51,091] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 143 [default2]:[2022-09-03 19:58:51,131] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 58 [default7]:[2022-09-03 19:58:51,167] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 87 [default1]:[2022-09-03 19:58:51,175] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 81 [default3]:[2022-09-03 19:58:51,238] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 259 [default3]:[2022-09-03 19:58:51,253] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 243 [default5]:[2022-09-03 19:58:51,220] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 45 [default6]:[2022-09-03 19:58:51,335] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 118 [default7]:[2022-09-03 19:58:51,335] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 119 [default6]:[2022-09-03 19:58:51,274] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 142 [default1]:[2022-09-03 19:58:51,348] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 89 [default2]:[2022-09-03 19:58:51,441] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 130 [default5]:[2022-09-03 19:58:51,439] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 149 [default6]:[2022-09-03 19:58:51,416] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 150 [default1]:[2022-09-03 19:58:51,443] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 241 [default3]:[2022-09-03 19:58:51,384] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 139 [default5]:[2022-09-03 19:58:51,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt. [default5]:[2022-09-03 19:58:51,446] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 285 [default6]:[2022-09-03 19:58:51,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt. [default6]:[2022-09-03 19:58:51,366] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 286 [default7]:[2022-09-03 19:58:51,432] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 7 [default5]:[2022-09-03 19:58:51,436] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 5 [default6]:[2022-09-03 19:58:51,527] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 86 [default6]:[2022-09-03 19:58:51,528] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 246 [default7]:[2022-09-03 19:58:51,480] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 151 [default6]:[2022-09-03 19:58:51,491] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 6 [default4]:[2022-09-03 19:58:51,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt. [default4]:[2022-09-03 19:58:51,503] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 284 [default0]:[2022-09-03 19:58:51,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [default0]:[2022-09-03 19:58:51,603] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 0 [default4]:[2022-09-03 19:58:51,574] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 116 [default5]:[2022-09-03 19:58:51,592] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 117 [default5]:[2022-09-03 19:58:51,633] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 141 [default4]:[2022-09-03 19:58:51,566] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 148 [default7]:[2022-09-03 19:58:51,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt. [default7]:[2022-09-03 19:58:51,580] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 287 [default0]:[2022-09-03 19:58:51,695] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 88 [default4]:[2022-09-03 19:58:51,670] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 140 [default5]:[2022-09-03 19:58:51,726] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 237 [default1]:[2022-09-03 19:58:51,722] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 185 [default2]:[2022-09-03 19:58:51,720] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 242 [default5]:[2022-09-03 19:58:51,756] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 85 [default1]:[2022-09-03 19:58:51,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. [default1]:[2022-09-03 19:58:51,776] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 1 [default4]:[2022-09-03 19:58:51,816] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 84 [default1]:[2022-09-03 19:58:51,755] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 57 [default4]:[2022-09-03 19:58:51,781] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 236 [default6]:[2022-09-03 19:58:51,818] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 238 [default3]:[2022-09-03 19:58:51,856] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 91 [default2]:[2022-09-03 19:58:51,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. [default2]:[2022-09-03 19:58:51,915] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 2 [default1]:[2022-09-03 19:58:51,915] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 129 [default3]:[2022-09-03 19:58:51,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step5/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. [default3]:[2022-09-03 19:58:51,938] [INFO] [engine.py:2833:_get_all_zero_checkpoint_state_dicts] successfully read 4 ZeRO state_dicts for rank 3 [default4]:[2022-09-03 19:58:51,949] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 4 [default3]:[2022-09-03 19:58:51,993] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 131 [default0]:[2022-09-03 19:58:52,075] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 184 [default3]:[2022-09-03 19:58:52,144] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 27 [default7]:[2022-09-03 19:58:52,195] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 95 [default4]:[2022-09-03 19:58:52,302] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 92 [default0]:[2022-09-03 19:58:52,306] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 240 [default1]:[2022-09-03 19:58:52,412] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 25 [default2]:[2022-09-03 19:58:52,428] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 98 [default0]:[2022-09-03 19:58:52,371] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 128 [default4]:[2022-09-03 19:58:52,411] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 244 [default5]:[2022-09-03 19:58:52,394] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 245 [default5]:[2022-09-03 19:58:52,529] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 93 [default3]:[2022-09-03 19:58:52,483] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 99 [default7]:[2022-09-03 19:58:52,551] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 191 [default2]:[2022-09-03 19:58:52,593] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 90 [default2]:[2022-09-03 19:58:52,687] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 26 [default5]:[2022-09-03 19:58:52,721] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 189 [default7]:[2022-09-03 19:58:52,670] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 63 [default3]:[2022-09-03 19:58:52,692] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 187 [default0]:[2022-09-03 19:58:52,838] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 96 [default3]:[2022-09-03 19:58:52,838] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 59 [default1]:[2022-09-03 19:58:52,846] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 97 [default4]:[2022-09-03 19:58:52,856] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 188 [default6]:[2022-09-03 19:58:52,905] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 134 [default6]:[2022-09-03 19:58:52,908] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 62 [default7]:[2022-09-03 19:58:52,958] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 135 [default5]:[2022-09-03 19:58:53,042] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 101 [default5]:[2022-09-03 19:58:52,967] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 133 [default4]:[2022-09-03 19:58:53,046] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 132 [default5]:[2022-09-03 19:58:53,034] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 61 [default4]:[2022-09-03 19:58:53,069] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 60 [default6]:[2022-09-03 19:58:53,258] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 30 [default7]:[2022-09-03 19:58:53,338] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 31 [default5]:[2022-09-03 19:58:53,391] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 29 [default7]:[2022-09-03 19:58:53,439] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 103 [default4]:[2022-09-03 19:58:53,451] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 28 [default6]:[2022-09-03 19:58:53,470] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 102 [default0]:[2022-09-03 19:58:53,770] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 0 [default0]: checkpoint version 3.0 [default1]:[2022-09-03 19:58:54,222] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 1 [default6]:[2022-09-03 19:58:54,263] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 286 [default5]:[2022-09-03 19:58:54,625] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 285 [default2]:[2022-09-03 19:58:54,573] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 2 [default4]:[2022-09-03 19:58:54,664] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 284 [default3]:[2022-09-03 19:58:54,582] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 3 [default0]: successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq at iteration 5 [default0]:estimated model parameters: 258.958393344 [default0]:estimated model parameters without embeddings: 0.002064384 [default0]:[after model, optimizer, and learning rate scheduler are built] datetime: 2022-09-03 19:58:54 [default0]:> building train, validation, and test datasets ... [default0]: > datasets target sizes (minimum size): [default0]: train: 6348800 [default0]: validation: 26624 [default0]: test: 2048 [default0]:> building train, validation, and test datasets for T0 ... [default0]: > building dataset index ... [default7]:[2022-09-03 19:58:54,691] [INFO] [engine.py:2767:_load_zero_checkpoint] loading 4 zero partition checkpoints for rank 287 [default7]:time (ms) | load-checkpoint: 24867.97 [default0]:/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/utils.py:365: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings [default0]: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.125389 seconds [default0]: number of documents: 90897616 [default0]: > dataset split: [default0]: train: [default0]: document indices in [0, 90897616) total of 90897616 documents [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.029582 seconds [default0]: number of documents: 90897616 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.002885 seconds [default0]: number of documents: 90897616 [default0]: > loading doc-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_batch_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_shuffle_idx.npy [default0]: loaded indexed file in 0.143 seconds [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.032716 seconds [default0]: number of documents: 15234080 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [14472376, 15234080) total of 761704 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_885ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_885ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_885ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.058 seconds [default0]: total number of samples: 221750 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.111130 seconds [default0]: number of documents: 6142390 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [5835270, 6142390) total of 307120 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_301ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_301ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_301ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.067 seconds [default0]: total number of samples: 136143 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.081664 seconds [default0]: number of documents: 26176998 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [24868148, 26176998) total of 1308850 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_3486ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_3486ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_3486ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.082 seconds [default0]: total number of samples: 432311 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.069575 seconds [default0]: number of documents: 20844665 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [19802432, 20844665) total of 1042233 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_5933ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_5933ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_5933ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.076 seconds [default0]: total number of samples: 521545 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.082067 seconds [default0]: number of documents: 67005817 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [63655526, 67005817) total of 3350291 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_2855ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_2855ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_2855ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.144 seconds [default0]: total number of samples: 1740321 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.124913 seconds [default0]: number of documents: 5149795 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4892305, 5149795) total of 257490 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_42ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_42ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_42ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.049 seconds [default0]: total number of samples: 26370 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.140437 seconds [default0]: number of documents: 58847091 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [55904736, 58847091) total of 2942355 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_3493ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_3493ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_3493ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.136 seconds [default0]: total number of samples: 1458654 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.086820 seconds [default0]: number of documents: 12514253 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11888540, 12514253) total of 625713 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_293ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_293ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_293ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.059 seconds [default0]: total number of samples: 134071 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.040202 seconds [default0]: number of documents: 180608 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [171578, 180608) total of 9030 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_3ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_3ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_3ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.008 seconds [default0]: total number of samples: 2501 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.068821 seconds [default0]: number of documents: 12303134 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11687977, 12303134) total of 615157 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_147ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_147ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_147ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.053 seconds [default0]: total number of samples: 157244 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.067371 seconds [default0]: number of documents: 2033057 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1931404, 2033057) total of 101653 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_11ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_11ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_11ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.030 seconds [default0]: total number of samples: 20517 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.157440 seconds [default0]: number of documents: 26793553 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [25453875, 26793553) total of 1339678 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_200ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_200ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_200ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.085 seconds [default0]: total number of samples: 101502 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.095398 seconds [default0]: number of documents: 3155990 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2998190, 3155990) total of 157800 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_17ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_17ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_17ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.013 seconds [default0]: total number of samples: 44182 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.058363 seconds [default0]: number of documents: 6692522 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [6357896, 6692522) total of 334626 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_28ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_28ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_28ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.059 seconds [default0]: total number of samples: 47613 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.044677 seconds [default0]: number of documents: 3017261 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2866398, 3017261) total of 150863 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.072 seconds [default0]: total number of samples: 29298 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.059227 seconds [default0]: number of documents: 3648041 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [3465639, 3648041) total of 182402 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_18ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_18ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_18ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.047 seconds [default0]: total number of samples: 5659 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.065135 seconds [default0]: number of documents: 4327282 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4110918, 4327282) total of 216364 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_10ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_10ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_10ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.026 seconds [default0]: total number of samples: 12423 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.072730 seconds [default0]: number of documents: 2698896 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2563951, 2698896) total of 134945 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.004 seconds [default0]: total number of samples: 19133 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.073541 seconds [default0]: number of documents: 12767593 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [12129213, 12767593) total of 638380 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_57ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_57ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_57ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.068 seconds [default0]: total number of samples: 87928 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.048247 seconds [default0]: number of documents: 4342323 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4125207, 4342323) total of 217116 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_25ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_25ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_25ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.069 seconds [default0]: total number of samples: 69780 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.055464 seconds [default0]: number of documents: 3022722 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2871586, 3022722) total of 151136 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_34ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_34ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_34ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.007 seconds [default0]: total number of samples: 22532 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.057274 seconds [default0]: number of documents: 1162568 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1104440, 1162568) total of 58128 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_9ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_9ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_9ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.003 seconds [default0]: total number of samples: 1608 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.063823 seconds [default0]: number of documents: 55294645 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [52529913, 55294645) total of 2764732 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_2178ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_2178ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_2178ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.071 seconds [default0]: total number of samples: 690621 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.055937 seconds [default0]: number of documents: 44855616 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [42612835, 44855616) total of 2242781 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_1480ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_1480ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_1480ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.081 seconds [default0]: total number of samples: 468689 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.075307 seconds [default0]: number of documents: 31969891 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [30371396, 31969891) total of 1598495 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_1326ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_1326ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_1326ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.064 seconds [default0]: total number of samples: 497625 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.163619 seconds [default0]: number of documents: 34110375 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [32404856, 34110375) total of 1705519 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_659ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_659ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_659ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.080 seconds [default0]: total number of samples: 125120 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.120330 seconds [default0]: number of documents: 43761623 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [41573542, 43761623) total of 2188081 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_3236ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_3236ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_3236ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.119 seconds [default0]: total number of samples: 1010592 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.040222 seconds [default0]: number of documents: 197602 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [187722, 197602) total of 9880 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_14ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.015 seconds [default0]: total number of samples: 4451 [default0]: total number of epochs: 1 [default0]:> building indices for blendable datasets ... [default0]: > sample ratios: [default0]: dataset 0, input: 0.0330676, achieved: 0.0330676 [default0]: dataset 1, input: 0.0112421, achieved: 0.0112421 [default0]: dataset 2, input: 0.130272, achieved: 0.130272 [default0]: dataset 3, input: 0.221712, achieved: 0.221712 [default0]: dataset 4, input: 0.106678, achieved: 0.106678 [default0]: dataset 5, input: 0.00155951, achieved: 0.00155955 [default0]: dataset 6, input: 0.13054, achieved: 0.13054 [default0]: dataset 7, input: 0.010918, achieved: 0.0109181 [default0]: dataset 8, input: 0.000110214, achieved: 0.000110257 [default0]: dataset 9, input: 0.00549238, achieved: 0.00549235 [default0]: dataset 10, input: 0.000402122, achieved: 0.000402094 [default0]: dataset 11, input: 0.00747007, achieved: 0.00747007 [default0]: dataset 12, input: 0.000619047, achieved: 0.000619024 [default0]: dataset 13, input: 0.00103353, achieved: 0.0010336 [default0]: dataset 14, input: 0.000501201, achieved: 0.000501226 [default0]: dataset 15, input: 0.000667277, achieved: 0.000667231 [default0]: dataset 16, input: 0.000359281, achieved: 0.000359326 [default0]: dataset 17, input: 0.000508443, achieved: 0.000508519 [default0]: dataset 18, input: 0.00211373, achieved: 0.0021138 [default0]: dataset 19, input: 0.000912995, achieved: 0.000912961 [default0]: dataset 20, input: 0.00124543, achieved: 0.00124546 [default0]: dataset 21, input: 0.000315887, achieved: 0.00031594 [default0]: dataset 22, input: 0.0813721, achieved: 0.0813721 [default0]: dataset 23, input: 0.0552939, achieved: 0.0552939 [default0]: dataset 24, input: 0.0495415, achieved: 0.0495414 [default0]: dataset 25, input: 0.0246164, achieved: 0.0246163 [default0]: dataset 26, input: 0.120917, achieved: 0.120917 [default0]: dataset 27, input: 0.000517703, achieved: 0.000517666 [default0]:> elapsed time for building blendable dataset indices: 0.32 (sec) [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.061785 seconds [default0]: number of documents: 2940097 [default0]: > dataset split: [default0]: valid: [default0]: document indices in [0, 2940097) total of 2940097 documents [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.011625 seconds [default0]: number of documents: 2940097 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003065 seconds [default0]: number of documents: 2940097 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default0]: > loading doc-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation_valid_indexmap_26624ns_42s_decoder_packed_batch_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation_valid_indexmap_26624ns_42s_decoder_packed_shuffle_idx.npy [default0]: loaded indexed file in 0.037 seconds [default0]:> finished creating T0 datasets ... [default0]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default0]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default1]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default4]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default5]:GOTCONSUMEDSAMPLES 10240 0 [default6]:GOTCONSUMEDSAMPLES 10240 0 [default2]:GOTCONSUMEDSAMPLES 10240 0 [default3]:GOTCONSUMEDSAMPLES 10240 0 [default7]:GOTCONSUMEDSAMPLES 10240 0 [default0]:[000-046] 177.5835B / 177.5835B [default0]:[000-026] 177.5835B / 177.5835B [default4]:[000-019] 177.5835B / 177.5835B [default4]:[000-069] 177.5835B / 177.5835B [default4]:[000-047] 177.5835B / 177.5835B [default4]:[000-041] 177.5835B / 177.5835B [default4]:[000-071] 258.9563B / 0.0000B [default0]:[000-064] 177.5835B / 177.5835B [default4]:[000-065] 177.5835B / 177.5835B [default0]:[000-058] 177.5835B / 177.5835B [default4]:[000-009] 177.5835B / 177.5835B [default0]:[000-070] 177.5855B / 177.5855B [default4]:[000-001] 177.5835B / 177.5835B [default0]:[000-062] 177.5835B / 177.5835B [default0]:[000-030] 177.5835B / 177.5835B [default4]:[000-057] 177.5835B / 177.5835B[default0]:[000-056] 177.5835B / 177.5835B [default4]:[000-023] 177.5835B / 177.5835B [default4]:[000-063] 177.5835B / 177.5835B [default0]:[000-052] 177.5835B / 177.5835B [default4]:[000-029] 177.5835B / 177.5835B [default4]:[000-049] 177.5835B / 177.5835B [default4]:[000-045] 177.5835B / 177.5835B [default0]:[000-060] 177.5835B / 177.5835B [default0]:[000-016] 177.5835B / 177.5835B [default4]:[000-017] 177.5835B / 177.5835B [default0]:[000-010] 177.5835B / 177.5835B [default4]:[000-007] 177.5835B / 177.5835B [default0]:[000-008] 177.5835B / 177.5835B [default4]:[000-013] 177.5835B / 177.5835B [default0]:[000-012] 177.5835B / 177.5835B [default0]:[000-042] 177.5835B / 177.5835B [default0]:[000-028] 177.5835B / 177.5835B [default0]:[000-034] 177.5835B / 177.5835B [default0]:[after dataloaders are built] datetime: 2022-09-03 19:59:05 [default0]:done with setup ... [default0]:training ... [default0]:Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: [default0]:[000-000] 258.9584B / 0.0021B [default0]:[before the start of training step] datetime: 2022-09-03 19:59:05 [default0]:[000-018] 177.5835B / 177.5835B [default0]:[000-022] 177.5835B / 177.5835B [default4]:[000-061] 177.5835B / 177.5835B [default0]:[000-024] 177.5835B / 177.5835B [default0]:[000-004] 177.5835B / 177.5835B [default0]:[000-044] 177.5835B / 177.5835B [default4]:[000-031] 177.5835B / 177.5835B [default0]:[000-048] 177.5835B / 177.5835B [default0]:[000-020] 177.5835B / 177.5835B [default0]:[000-038] 177.5835B / 177.5835B [default4]:[000-039] 177.5835B / 177.5835B [default4]:[000-055] 177.5835B / 177.5835B [default4]:[000-005] 177.5835B / 177.5835B [default0]:[000-014] 177.5835B / 177.5835B [default0]:[000-054] 177.5835B / 177.5835B [default4]:[000-043] 177.5835B / 177.5835B [default4]:[000-025] 177.5835B / 177.5835B [default0]:[000-050] 177.5835B / 177.5835B [default0]:[000-040] 177.5835B / 177.5835B [default0]:[000-002] 177.5835B / 177.5835B [default0]:[000-036] 177.5835B / 177.5835B [default7]:time (ms) | model-and-optimizer-setup: 33458.81 | train/valid/test-data-iterators-setup: 10436.93 [default4]:[000-027] 177.5835B / 177.5835B [default0]:[000-066] 177.5835B / 177.5835B [default0]:[000-006] 177.5835B / 177.5835B [default0]:[000-032] 177.5835B / 177.5835B [default4]:[000-033] 177.5835B / 177.5835B [default4]:[000-053] 177.5835B / 177.5835B [default4]:[000-051] 177.5835B / 177.5835B [default4]:[000-003] 177.5835B / 177.5835B [default4]:[000-059] 177.5835B / 177.5835B [default4]:[000-037] 177.5835B / 177.5835B [default0]:[000-068] 177.5835B / 177.5835B [default4]:[000-021] 177.5835B / 177.5835B [default4]:[000-067] 177.5835B / 177.5835B [default4]:[000-015] 177.5835B / 177.5835B [default4]:[000-011] 177.5835B / 177.5835B [default4]:[000-035] 177.5835B / 177.5835B [default4]: [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default3]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default3]: return self._grad [default2]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default2]: return self._grad [default1]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default1]: return self._grad [default0]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default0]: return self._grad [default4]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default4]: return self._grad [default5]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default7]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default5]: return self._grad [default7]: return self._grad [default6]:/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352465323/work/build/aten/src/ATen/core/TensorBody.h:477.) [default6]: return self._grad [default0]:[Rank 248] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 31054.39501953125 | reserved: 37358.0 | max reserved: 37358.0 [default0]:[Rank 120] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 36686.39501953125 | reserved: 42454.0 | max reserved: 42454.0 [default4]:[Rank 252] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 30878.39501953125 | reserved: 37358.0 | max reserved: 37358.0 [default0]:[Rank 224] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 32110.39501953125 | reserved: 37974.0 | max reserved: 37974.0 [default4]:[Rank 92] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 37918.39501953125 | reserved: 43350.0 | max reserved: 43350.0 [default0]:[Rank 240] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 31406.39501953125 | reserved: 37078.0 | max reserved: 37078.0 [default0]:[Rank 168] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 34574.39501953125 | reserved: 40942.0 | max reserved: 40942.0 [default0]:[Rank 208] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 32814.39501953125 | reserved: 39150.0 | max reserved: 39150.0 [default4]:[Rank 196] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 33342.39501953125 | reserved: 38870.0 | max reserved: 38870.0 [default4]:[Rank 180] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 34046.39501953125 | reserved: 39766.0 | max reserved: 39766.0 [default0]:[Rank 136] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 35982.39501953125 | reserved: 41558.0 | max reserved: 41558.0 [default0]:[Rank 32] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 40558.39501953125 | reserved: 46038.0 | max reserved: 46038.0 [default4]:[Rank 68] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 38974.39501953125 | reserved: 44246.0 | max reserved: 44246.0 [default4]:[Rank 28] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 40734.39501953125 | reserved: 46038.0 | max reserved: 46038.0 [default4]:[Rank 116] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 36862.39501953125 | reserved: 42454.0 | max reserved: 42454.0 [default4]:[Rank 172] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 34398.39501953125 | reserved: 39766.0 | max reserved: 39766.0 [default0]:[Rank 40] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 40206.39501953125 | reserved: 46038.0 | max reserved: 46038.0 [default4]:[Rank 52] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 39678.39501953125 | reserved: 45142.0 | max reserved: 45142.0 [default0]:[Rank 192] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 33518.39501953125 | reserved: 40046.0 | max reserved: 40046.0 [default0]:[Rank 112] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 37038.39501953125 | reserved: 42454.0 | max reserved: 42454.0 [default0]:[Rank 48] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 39854.39501953125 | reserved: 45142.0 | max reserved: 45142.0 [default0]:[Rank 72] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 38798.39501953125 | reserved: 44246.0 | max reserved: 44246.0 [default0]:[Rank 64] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 39150.39501953125 | reserved: 45422.0 | max reserved: 45422.0 [default0]:[Rank 144] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 35630.39501953125 | reserved: 41558.0 | max reserved: 41558.0 [default0]:[Rank 200] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 33166.39501953125 | reserved: 38870.0 | max reserved: 38870.0 [default0]:[Rank 96] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 37742.39501953125 | reserved: 43350.0 | max reserved: 43350.0 [default4]:[Rank 220] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 32286.39501953125 | reserved: 37974.0 | max reserved: 37974.0 [default0]:[Rank 176] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 34222.39501953125 | reserved: 39766.0 | max reserved: 39766.0 [default4]:[Rank 156] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 35102.39501953125 | reserved: 40662.0 | max reserved: 40662.0 [default4]:[Rank 244] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 31230.39501953125 | reserved: 37078.0 | max reserved: 37078.0 [default0]:[Rank 16] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 41262.39501953125 | reserved: 46934.0 | max reserved: 46934.0 [default4]:[Rank 20] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 41086.39501953125 | reserved: 46934.0 | max reserved: 46934.0 [default7]: iteration 6/ 3100 | consumed samples: 12288 | consumed tokens: 25165824 | elapsed time per iteration (s): 205.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 1.771013E+00 | grad norm: 13.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 9.977 | TFLOPs: 101.85 | [default4]:[Rank 124] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 36510.39501953125 | reserved: 42454.0 | max reserved: 42454.0 [default0]:[Rank 88] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 38094.39501953125 | reserved: 44526.0 | max reserved: 44526.0 [default0]:[Rank 80] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 38446.39501953125 | reserved: 44246.0 | max reserved: 44246.0 [default0]:[Rank 152] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 35278.39501953125 | reserved: 40662.0 | max reserved: 40662.0 [default0]:[Rank 24] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 40910.39501953125 | reserved: 47214.0 | max reserved: 47214.0 [default4]:[Rank 108] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 37214.39501953125 | reserved: 43630.0 | max reserved: 43630.0 [default0]:[Rank 160] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 34926.39501953125 | reserved: 40662.0 | max reserved: 40662.0 [default0]:[Rank 216] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 32462.39501953125 | reserved: 37974.0 | max reserved: 37974.0 [default0]:[Rank 8] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 41614.39501953125 | reserved: 46934.0 | max reserved: 46934.0 [default0]:[Rank 264] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 30350.39501953125 | reserved: 36182.0 | max reserved: 36182.0 [default0]:[Rank 128] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 36334.39501953125 | reserved: 42734.0 | max reserved: 42734.0 [default4]:[Rank 236] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 31582.39501953125 | reserved: 37078.0 | max reserved: 37078.0 [default4]:[Rank 132] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 36158.39501953125 | reserved: 41558.0 | max reserved: 41558.0 [default4]:[Rank 12] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 41438.39501953125 | reserved: 46934.0 | max reserved: 46934.0 [default4]:[Rank 100] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 37566.39501953125 | reserved: 43350.0 | max reserved: 43350.0 [default4]:[Rank 204] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 32990.39501953125 | reserved: 38870.0 | max reserved: 38870.0 [default4]:[Rank 76] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 38622.39501953125 | reserved: 44246.0 | max reserved: 44246.0 [default4]:[Rank 140] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 35806.39501953125 | reserved: 41558.0 | max reserved: 41558.0 [default0]:[Rank 184] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 33870.39501953125 | reserved: 39766.0 | max reserved: 39766.0 [default0]:[Rank 0] (after 6 iterations) memory (MB) | allocated: 38080.58544921875 | max allocated: 62086.80322265625 | reserved: 76022.0 | max reserved: 76022.0 [default0]:[Rank 256] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 30702.39501953125 | reserved: 37358.0 | max reserved: 37358.0 [default4]:[Rank 84] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 38270.39501953125 | reserved: 44526.0 | max reserved: 44526.0 [default0]:[Rank 272] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 29998.39501953125 | reserved: 35286.0 | max reserved: 35286.0 [default0]:[Rank 56] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 39502.39501953125 | reserved: 45142.0 | max reserved: 45142.0 [default4]:[Rank 60] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 39326.39501953125 | reserved: 45142.0 | max reserved: 45142.0 [default4]:[Rank 212] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 32638.39501953125 | reserved: 37974.0 | max reserved: 37974.0 [default4]:[Rank 148] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 35454.39501953125 | reserved: 41838.0 | max reserved: 41838.0 [default4]:[Rank 284] (after 6 iterations) memory (MB) | allocated: 41930.33251953125 | max allocated: 55650.33203125 | reserved: 73748.0 | max reserved: 73748.0 [default4]:[Rank 260] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 30526.39501953125 | reserved: 36182.0 | max reserved: 36182.0 [default4]:[Rank 44] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 40030.39501953125 | reserved: 46318.0 | max reserved: 46318.0 [default4]:[Rank 268] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 30174.39501953125 | reserved: 35286.0 | max reserved: 35286.0 [default0]:[Rank 232] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 31758.39501953125 | reserved: 38254.0 | max reserved: 38254.0 [default0]:[Rank 280] (after 6 iterations) memory (MB) | allocated: 25990.69677734375 | max allocated: 29702.71142578125 | reserved: 35286.0 | max reserved: 35286.0 [default4]:[Rank 164] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 34750.39501953125 | reserved: 40942.0 | max reserved: 40942.0 [default4]:[Rank 228] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 31934.39501953125 | reserved: 38254.0 | max reserved: 38254.0 [default4]:[Rank 276] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 29822.39501953125 | reserved: 35286.0 | max reserved: 35286.0 [default0]:[Rank 104] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 37390.39501953125 | reserved: 43350.0 | max reserved: 43350.0 [default4]:[Rank 36] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 40382.39501953125 | reserved: 46038.0 | max reserved: 46038.0 [default4]:[Rank 188] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 33694.39501953125 | reserved: 40046.0 | max reserved: 40046.0 [default4]:[Rank 4] (after 6 iterations) memory (MB) | allocated: 25990.39599609375 | max allocated: 41790.39501953125 | reserved: 48110.0 | max reserved: 48110.0 [default7]: iteration 7/ 3100 | consumed samples: 14336 | consumed tokens: 29360128 | elapsed time per iteration (s): 142.57 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 1.375984E+00 | grad norm: 8.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.365 | TFLOPs: 146.64 | [default7]: iteration 8/ 3100 | consumed samples: 16384 | consumed tokens: 33554432 | elapsed time per iteration (s): 141.07 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 1.170504E+00 | grad norm: 2.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.517 | TFLOPs: 148.20 | [default7]: iteration 9/ 3100 | consumed samples: 18432 | consumed tokens: 37748736 | elapsed time per iteration (s): 141.26 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 1.123306E+00 | grad norm: 4.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.498 | TFLOPs: 148.00 | [default7]: iteration 10/ 3100 | consumed samples: 20480 | consumed tokens: 41943040 | elapsed time per iteration (s): 141.21 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 1.097810E+00 | grad norm: 1.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.503 | TFLOPs: 148.05 | [default7]: iteration 11/ 3100 | consumed samples: 22528 | consumed tokens: 46137344 | elapsed time per iteration (s): 142.96 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 1.055027E+00 | grad norm: 1.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.326 | TFLOPs: 146.25 | [default7]: iteration 12/ 3100 | consumed samples: 24576 | consumed tokens: 50331648 | elapsed time per iteration (s): 143.18 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 1.039329E+00 | grad norm: 2.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.304 | TFLOPs: 146.02 | [default7]: iteration 13/ 3100 | consumed samples: 26624 | consumed tokens: 54525952 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 1.007910E+00 | grad norm: 1.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.481 | TFLOPs: 147.83 | [default7]: iteration 14/ 3100 | consumed samples: 28672 | consumed tokens: 58720256 | elapsed time per iteration (s): 141.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 9.739922E-01 | grad norm: 1.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.502 | TFLOPs: 148.04 | [default7]: iteration 15/ 3100 | consumed samples: 30720 | consumed tokens: 62914560 | elapsed time per iteration (s): 143.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 9.744506E-01 | grad norm: 3.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.295 | TFLOPs: 145.93 | [default7]: iteration 16/ 3100 | consumed samples: 32768 | consumed tokens: 67108864 | elapsed time per iteration (s): 142.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 9.691636E-01 | grad norm: 1.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.379 | TFLOPs: 146.79 | [default7]: iteration 17/ 3100 | consumed samples: 34816 | consumed tokens: 71303168 | elapsed time per iteration (s): 142.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 9.394557E-01 | grad norm: 1.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.379 | TFLOPs: 146.79 | [default7]: iteration 18/ 3100 | consumed samples: 36864 | consumed tokens: 75497472 | elapsed time per iteration (s): 153.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 9.326955E-01 | grad norm: 1.124 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.353 | TFLOPs: 136.31 | [default7]: iteration 19/ 3100 | consumed samples: 38912 | consumed tokens: 79691776 | elapsed time per iteration (s): 168.54 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 9.258911E-01 | grad norm: 0.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 12.152 | TFLOPs: 124.05 | [default7]: iteration 20/ 3100 | consumed samples: 40960 | consumed tokens: 83886080 | elapsed time per iteration (s): 152.60 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 9.146605E-01 | grad norm: 0.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.420 | TFLOPs: 137.00 | [default7]: iteration 21/ 3100 | consumed samples: 43008 | consumed tokens: 88080384 | elapsed time per iteration (s): 167.14 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 9.092812E-01 | grad norm: 1.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 12.254 | TFLOPs: 125.09 | [default7]: iteration 22/ 3100 | consumed samples: 45056 | consumed tokens: 92274688 | elapsed time per iteration (s): 165.47 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 9.049101E-01 | grad norm: 0.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 12.377 | TFLOPs: 126.35 | [default7]: iteration 23/ 3100 | consumed samples: 47104 | consumed tokens: 96468992 | elapsed time per iteration (s): 145.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 9.032789E-01 | grad norm: 1.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.080 | TFLOPs: 143.73 | [default7]: iteration 24/ 3100 | consumed samples: 49152 | consumed tokens: 100663296 | elapsed time per iteration (s): 155.34 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.886566E-01 | grad norm: 0.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.184 | TFLOPs: 134.59 | [default7]: iteration 25/ 3100 | consumed samples: 51200 | consumed tokens: 104857600 | elapsed time per iteration (s): 165.76 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.854945E-01 | grad norm: 0.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 12.355 | TFLOPs: 126.13 | [default7]: iteration 26/ 3100 | consumed samples: 53248 | consumed tokens: 109051904 | elapsed time per iteration (s): 157.01 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.855023E-01 | grad norm: 3.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.044 | TFLOPs: 133.16 | [default7]: iteration 27/ 3100 | consumed samples: 55296 | consumed tokens: 113246208 | elapsed time per iteration (s): 150.22 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.900498E-01 | grad norm: 1.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.634 | TFLOPs: 139.18 | [default7]: iteration 28/ 3100 | consumed samples: 57344 | consumed tokens: 117440512 | elapsed time per iteration (s): 146.29 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.823168E-01 | grad norm: 1.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.000 | TFLOPs: 142.92 | [default7]: iteration 29/ 3100 | consumed samples: 59392 | consumed tokens: 121634816 | elapsed time per iteration (s): 151.07 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.847864E-01 | grad norm: 47.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.557 | TFLOPs: 138.39 | [default7]: iteration 30/ 3100 | consumed samples: 61440 | consumed tokens: 125829120 | elapsed time per iteration (s): 146.96 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.862412E-01 | grad norm: 4.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.936 | TFLOPs: 142.26 | [default7]: iteration 31/ 3100 | consumed samples: 63488 | consumed tokens: 130023424 | elapsed time per iteration (s): 146.47 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.944387E-01 | grad norm: 4.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.982 | TFLOPs: 142.74 | [default7]: iteration 32/ 3100 | consumed samples: 65536 | consumed tokens: 134217728 | elapsed time per iteration (s): 142.70 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.892818E-01 | grad norm: 2.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.352 | TFLOPs: 146.51 | [default7]: iteration 33/ 3100 | consumed samples: 67584 | consumed tokens: 138412032 | elapsed time per iteration (s): 143.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.772181E-01 | grad norm: 5.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.294 | TFLOPs: 145.92 | [default7]: iteration 34/ 3100 | consumed samples: 69632 | consumed tokens: 142606336 | elapsed time per iteration (s): 142.66 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.878008E-01 | grad norm: 1.883 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.356 | TFLOPs: 146.55 | [default7]: iteration 35/ 3100 | consumed samples: 71680 | consumed tokens: 146800640 | elapsed time per iteration (s): 142.53 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.771838E-01 | grad norm: 1.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.369 | TFLOPs: 146.68 | [default7]: iteration 36/ 3100 | consumed samples: 73728 | consumed tokens: 150994944 | elapsed time per iteration (s): 142.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.743373E-01 | grad norm: 1.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.385 | TFLOPs: 146.84 | [default7]: iteration 37/ 3100 | consumed samples: 75776 | consumed tokens: 155189248 | elapsed time per iteration (s): 146.96 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.600196E-01 | grad norm: 1.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.936 | TFLOPs: 142.26 | [default7]: iteration 38/ 3100 | consumed samples: 77824 | consumed tokens: 159383552 | elapsed time per iteration (s): 165.95 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.740050E-01 | grad norm: 1.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 12.341 | TFLOPs: 125.98 | [default7]: iteration 39/ 3100 | consumed samples: 79872 | consumed tokens: 163577856 | elapsed time per iteration (s): 143.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.669653E-01 | grad norm: 0.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.270 | TFLOPs: 145.68 | [default7]: iteration 40/ 3100 | consumed samples: 81920 | consumed tokens: 167772160 | elapsed time per iteration (s): 152.93 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.599278E-01 | grad norm: 1.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.391 | TFLOPs: 136.71 | [default7]: iteration 41/ 3100 | consumed samples: 83968 | consumed tokens: 171966464 | elapsed time per iteration (s): 151.30 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.839574E-01 | grad norm: 4.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.536 | TFLOPs: 138.19 | [default7]: iteration 42/ 3100 | consumed samples: 86016 | consumed tokens: 176160768 | elapsed time per iteration (s): 147.08 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.587544E-01 | grad norm: 1.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.924 | TFLOPs: 142.15 | [default7]: iteration 43/ 3100 | consumed samples: 88064 | consumed tokens: 180355072 | elapsed time per iteration (s): 154.79 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.700029E-01 | grad norm: 1.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.231 | TFLOPs: 135.07 | [default7]: iteration 44/ 3100 | consumed samples: 90112 | consumed tokens: 184549376 | elapsed time per iteration (s): 146.19 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.517140E-01 | grad norm: 0.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.009 | TFLOPs: 143.01 | [default7]: iteration 45/ 3100 | consumed samples: 92160 | consumed tokens: 188743680 | elapsed time per iteration (s): 149.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.509599E-01 | grad norm: 0.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.713 | TFLOPs: 139.99 | [default7]: iteration 46/ 3100 | consumed samples: 94208 | consumed tokens: 192937984 | elapsed time per iteration (s): 148.03 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.610727E-01 | grad norm: 0.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.835 | TFLOPs: 141.23 | [default7]: iteration 47/ 3100 | consumed samples: 96256 | consumed tokens: 197132288 | elapsed time per iteration (s): 144.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.430433E-01 | grad norm: 2.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.178 | TFLOPs: 144.74 | [default7]: iteration 48/ 3100 | consumed samples: 98304 | consumed tokens: 201326592 | elapsed time per iteration (s): 148.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.461150E-01 | grad norm: 0.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.800 | TFLOPs: 140.87 | [default7]: iteration 49/ 3100 | consumed samples: 100352 | consumed tokens: 205520896 | elapsed time per iteration (s): 150.62 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.324211E-01 | grad norm: 0.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.597 | TFLOPs: 138.80 | [default7]: iteration 50/ 3100 | consumed samples: 102400 | consumed tokens: 209715200 | elapsed time per iteration (s): 147.11 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.365496E-01 | grad norm: 0.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.922 | TFLOPs: 142.12 | [default7]: iteration 51/ 3100 | consumed samples: 104448 | consumed tokens: 213909504 | elapsed time per iteration (s): 141.82 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.354586E-01 | grad norm: 1.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.441 | TFLOPs: 147.42 | [default7]: iteration 52/ 3100 | consumed samples: 106496 | consumed tokens: 218103808 | elapsed time per iteration (s): 143.34 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.326620E-01 | grad norm: 1.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.288 | TFLOPs: 145.86 | [default7]: iteration 53/ 3100 | consumed samples: 108544 | consumed tokens: 222298112 | elapsed time per iteration (s): 141.05 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.371478E-01 | grad norm: 0.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.519 | TFLOPs: 148.22 | [default7]: iteration 54/ 3100 | consumed samples: 110592 | consumed tokens: 226492416 | elapsed time per iteration (s): 142.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.376181E-01 | grad norm: 0.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.383 | TFLOPs: 146.83 | [default7]: iteration 55/ 3100 | consumed samples: 112640 | consumed tokens: 230686720 | elapsed time per iteration (s): 142.21 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.294340E-01 | grad norm: 0.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.401 | TFLOPs: 147.01 | [default7]: iteration 56/ 3100 | consumed samples: 114688 | consumed tokens: 234881024 | elapsed time per iteration (s): 144.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.245198E-01 | grad norm: 1.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.183 | TFLOPs: 144.79 | [default7]: iteration 57/ 3100 | consumed samples: 116736 | consumed tokens: 239075328 | elapsed time per iteration (s): 141.49 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.275281E-01 | grad norm: 1.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.475 | TFLOPs: 147.76 | [default7]: iteration 58/ 3100 | consumed samples: 118784 | consumed tokens: 243269632 | elapsed time per iteration (s): 141.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.247311E-01 | grad norm: 0.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.435 | TFLOPs: 147.36 | [default7]: iteration 59/ 3100 | consumed samples: 120832 | consumed tokens: 247463936 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.308536E-01 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.473 | TFLOPs: 147.75 | [default7]: iteration 60/ 3100 | consumed samples: 122880 | consumed tokens: 251658240 | elapsed time per iteration (s): 141.73 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.320975E-01 | grad norm: 0.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.450 | TFLOPs: 147.51 | [default7]: iteration 61/ 3100 | consumed samples: 124928 | consumed tokens: 255852544 | elapsed time per iteration (s): 141.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.302983E-01 | grad norm: 0.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.483 | TFLOPs: 147.85 | [default7]: iteration 62/ 3100 | consumed samples: 126976 | consumed tokens: 260046848 | elapsed time per iteration (s): 141.55 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.322481E-01 | grad norm: 0.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.468 | TFLOPs: 147.70 | [default7]: iteration 63/ 3100 | consumed samples: 129024 | consumed tokens: 264241152 | elapsed time per iteration (s): 141.99 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.147445E-01 | grad norm: 0.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.424 | TFLOPs: 147.24 | [default7]: iteration 64/ 3100 | consumed samples: 131072 | consumed tokens: 268435456 | elapsed time per iteration (s): 141.56 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.213267E-01 | grad norm: 0.690 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.467 | TFLOPs: 147.68 | [default7]: iteration 65/ 3100 | consumed samples: 133120 | consumed tokens: 272629760 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.258882E-01 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.02 | [default7]: iteration 66/ 3100 | consumed samples: 135168 | consumed tokens: 276824064 | elapsed time per iteration (s): 140.93 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.303090E-01 | grad norm: 0.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.532 | TFLOPs: 148.35 | [default7]: iteration 67/ 3100 | consumed samples: 137216 | consumed tokens: 281018368 | elapsed time per iteration (s): 141.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.223020E-01 | grad norm: 0.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.435 | TFLOPs: 147.36 | [default7]: iteration 68/ 3100 | consumed samples: 139264 | consumed tokens: 285212672 | elapsed time per iteration (s): 141.31 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.176242E-01 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.493 | TFLOPs: 147.95 | [default7]: iteration 69/ 3100 | consumed samples: 141312 | consumed tokens: 289406976 | elapsed time per iteration (s): 141.22 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.180224E-01 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.502 | TFLOPs: 148.04 | [default7]: iteration 70/ 3100 | consumed samples: 143360 | consumed tokens: 293601280 | elapsed time per iteration (s): 141.69 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.229319E-01 | grad norm: 0.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.454 | TFLOPs: 147.55 | [default7]: iteration 71/ 3100 | consumed samples: 145408 | consumed tokens: 297795584 | elapsed time per iteration (s): 141.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.307614E-01 | grad norm: 0.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.486 | TFLOPs: 147.88 | [default7]: iteration 72/ 3100 | consumed samples: 147456 | consumed tokens: 301989888 | elapsed time per iteration (s): 143.14 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.147565E-01 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.308 | TFLOPs: 146.06 | [default7]: iteration 73/ 3100 | consumed samples: 149504 | consumed tokens: 306184192 | elapsed time per iteration (s): 142.83 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.092042E-01 | grad norm: 0.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.339 | TFLOPs: 146.38 | [default7]: iteration 74/ 3100 | consumed samples: 151552 | consumed tokens: 310378496 | elapsed time per iteration (s): 142.75 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.137342E-01 | grad norm: 0.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.347 | TFLOPs: 146.46 | [default7]: iteration 75/ 3100 | consumed samples: 153600 | consumed tokens: 314572800 | elapsed time per iteration (s): 141.20 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.117334E-01 | grad norm: 0.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.504 | TFLOPs: 148.07 | [default7]: iteration 76/ 3100 | consumed samples: 155648 | consumed tokens: 318767104 | elapsed time per iteration (s): 141.13 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.266841E-01 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.512 | TFLOPs: 148.14 | [default7]: iteration 77/ 3100 | consumed samples: 157696 | consumed tokens: 322961408 | elapsed time per iteration (s): 141.44 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.002620E-01 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.81 | [default7]: iteration 78/ 3100 | consumed samples: 159744 | consumed tokens: 327155712 | elapsed time per iteration (s): 142.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.137600E-01 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.383 | TFLOPs: 146.83 | [default7]: iteration 79/ 3100 | consumed samples: 161792 | consumed tokens: 331350016 | elapsed time per iteration (s): 142.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.005607E-01 | grad norm: 0.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.389 | TFLOPs: 146.89 | [default7]: iteration 80/ 3100 | consumed samples: 163840 | consumed tokens: 335544320 | elapsed time per iteration (s): 141.12 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.107273E-01 | grad norm: 0.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.512 | TFLOPs: 148.15 | [default7]: iteration 81/ 3100 | consumed samples: 165888 | consumed tokens: 339738624 | elapsed time per iteration (s): 141.92 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.073310E-01 | grad norm: 0.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.430 | TFLOPs: 147.31 | [default7]: iteration 82/ 3100 | consumed samples: 167936 | consumed tokens: 343932928 | elapsed time per iteration (s): 141.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.035055E-01 | grad norm: 0.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.481 | TFLOPs: 147.83 | [default7]: iteration 83/ 3100 | consumed samples: 169984 | consumed tokens: 348127232 | elapsed time per iteration (s): 142.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.985230E-01 | grad norm: 0.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.389 | TFLOPs: 146.89 | [default7]: iteration 84/ 3100 | consumed samples: 172032 | consumed tokens: 352321536 | elapsed time per iteration (s): 143.07 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.055478E-01 | grad norm: 1.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.315 | TFLOPs: 146.14 | [default7]: iteration 85/ 3100 | consumed samples: 174080 | consumed tokens: 356515840 | elapsed time per iteration (s): 140.97 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.079987E-01 | grad norm: 0.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.528 | TFLOPs: 148.31 | [default7]: iteration 86/ 3100 | consumed samples: 176128 | consumed tokens: 360710144 | elapsed time per iteration (s): 141.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.902204E-01 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.478 | TFLOPs: 147.80 | [default7]: iteration 87/ 3100 | consumed samples: 178176 | consumed tokens: 364904448 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.062608E-01 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.01 | [default7]: iteration 88/ 3100 | consumed samples: 180224 | consumed tokens: 369098752 | elapsed time per iteration (s): 141.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.070796E-01 | grad norm: 0.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.492 | TFLOPs: 147.94 | [default7]: iteration 89/ 3100 | consumed samples: 182272 | consumed tokens: 373293056 | elapsed time per iteration (s): 143.07 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.993592E-01 | grad norm: 0.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.315 | TFLOPs: 146.13 | [default7]: iteration 90/ 3100 | consumed samples: 184320 | consumed tokens: 377487360 | elapsed time per iteration (s): 142.87 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.959936E-01 | grad norm: 0.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.334 | TFLOPs: 146.33 | [default7]: iteration 91/ 3100 | consumed samples: 186368 | consumed tokens: 381681664 | elapsed time per iteration (s): 141.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.962327E-01 | grad norm: 0.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 92/ 3100 | consumed samples: 188416 | consumed tokens: 385875968 | elapsed time per iteration (s): 141.59 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.960305E-01 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.464 | TFLOPs: 147.65 | [default7]: iteration 93/ 3100 | consumed samples: 190464 | consumed tokens: 390070272 | elapsed time per iteration (s): 141.16 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.899939E-01 | grad norm: 0.934 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.508 | TFLOPs: 148.11 | [default7]: iteration 94/ 3100 | consumed samples: 192512 | consumed tokens: 394264576 | elapsed time per iteration (s): 141.99 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.955246E-01 | grad norm: 0.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.423 | TFLOPs: 147.24 | [default7]: iteration 95/ 3100 | consumed samples: 194560 | consumed tokens: 398458880 | elapsed time per iteration (s): 143.12 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.031546E-01 | grad norm: 0.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.310 | TFLOPs: 146.08 | [default7]: iteration 96/ 3100 | consumed samples: 196608 | consumed tokens: 402653184 | elapsed time per iteration (s): 141.36 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.012702E-01 | grad norm: 0.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.488 | TFLOPs: 147.90 | [default7]: iteration 97/ 3100 | consumed samples: 198656 | consumed tokens: 406847488 | elapsed time per iteration (s): 141.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 8.006363E-01 | grad norm: 0.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.487 | TFLOPs: 147.89 | [default7]: iteration 98/ 3100 | consumed samples: 200704 | consumed tokens: 411041792 | elapsed time per iteration (s): 142.01 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.847059E-01 | grad norm: 0.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.422 | TFLOPs: 147.23 | [default7]: iteration 99/ 3100 | consumed samples: 202752 | consumed tokens: 415236096 | elapsed time per iteration (s): 142.94 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.946941E-01 | grad norm: 0.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.328 | TFLOPs: 146.27 | [default7]: iteration 100/ 3100 | consumed samples: 204800 | consumed tokens: 419430400 | elapsed time per iteration (s): 142.50 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.932592E-01 | grad norm: 0.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.372 | TFLOPs: 146.71 | [default7]: iteration 101/ 3100 | consumed samples: 206848 | consumed tokens: 423624704 | elapsed time per iteration (s): 141.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.903042E-01 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.489 | TFLOPs: 147.91 | [default7]: iteration 102/ 3100 | consumed samples: 208896 | consumed tokens: 427819008 | elapsed time per iteration (s): 141.49 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.961559E-01 | grad norm: 0.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.475 | TFLOPs: 147.77 | [default7]: iteration 103/ 3100 | consumed samples: 210944 | consumed tokens: 432013312 | elapsed time per iteration (s): 142.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.969423E-01 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.399 | TFLOPs: 146.99 | [default7]: iteration 104/ 3100 | consumed samples: 212992 | consumed tokens: 436207616 | elapsed time per iteration (s): 141.70 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.896079E-01 | grad norm: 0.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.453 | TFLOPs: 147.55 | [default7]: iteration 105/ 3100 | consumed samples: 215040 | consumed tokens: 440401920 | elapsed time per iteration (s): 142.95 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.832251E-01 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.326 | TFLOPs: 146.25 | [default7]: iteration 106/ 3100 | consumed samples: 217088 | consumed tokens: 444596224 | elapsed time per iteration (s): 141.08 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.852587E-01 | grad norm: 0.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.517 | TFLOPs: 148.20 | [default7]: iteration 107/ 3100 | consumed samples: 219136 | consumed tokens: 448790528 | elapsed time per iteration (s): 141.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.965346E-01 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.501 | TFLOPs: 148.03 | [default7]: iteration 108/ 3100 | consumed samples: 221184 | consumed tokens: 452984832 | elapsed time per iteration (s): 140.89 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.814828E-01 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.536 | TFLOPs: 148.39 | [default7]: iteration 109/ 3100 | consumed samples: 223232 | consumed tokens: 457179136 | elapsed time per iteration (s): 142.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.864393E-01 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.334 | TFLOPs: 146.32 | [default7]: iteration 110/ 3100 | consumed samples: 225280 | consumed tokens: 461373440 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.880430E-01 | grad norm: 0.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.485 | TFLOPs: 147.87 | [default7]: iteration 111/ 3100 | consumed samples: 227328 | consumed tokens: 465567744 | elapsed time per iteration (s): 143.05 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.876133E-01 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.317 | TFLOPs: 146.15 | [default7]: iteration 112/ 3100 | consumed samples: 229376 | consumed tokens: 469762048 | elapsed time per iteration (s): 143.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.786481E-01 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.294 | TFLOPs: 145.92 | [default7]: iteration 113/ 3100 | consumed samples: 231424 | consumed tokens: 473956352 | elapsed time per iteration (s): 142.81 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.758213E-01 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.341 | TFLOPs: 146.40 | [default7]: iteration 114/ 3100 | consumed samples: 233472 | consumed tokens: 478150656 | elapsed time per iteration (s): 141.34 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.874675E-01 | grad norm: 0.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.490 | TFLOPs: 147.92 | [default7]: iteration 115/ 3100 | consumed samples: 235520 | consumed tokens: 482344960 | elapsed time per iteration (s): 142.17 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.731553E-01 | grad norm: 0.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.406 | TFLOPs: 147.06 | [default7]: iteration 116/ 3100 | consumed samples: 237568 | consumed tokens: 486539264 | elapsed time per iteration (s): 141.05 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.775542E-01 | grad norm: 0.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.520 | TFLOPs: 148.23 | [default7]: iteration 117/ 3100 | consumed samples: 239616 | consumed tokens: 490733568 | elapsed time per iteration (s): 141.18 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.895082E-01 | grad norm: 0.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.506 | TFLOPs: 148.08 | [default7]: iteration 118/ 3100 | consumed samples: 241664 | consumed tokens: 494927872 | elapsed time per iteration (s): 141.74 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.796859E-01 | grad norm: 0.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.449 | TFLOPs: 147.50 | [default7]: iteration 119/ 3100 | consumed samples: 243712 | consumed tokens: 499122176 | elapsed time per iteration (s): 141.99 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.821237E-01 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.423 | TFLOPs: 147.24 | [default7]: iteration 120/ 3100 | consumed samples: 245760 | consumed tokens: 503316480 | elapsed time per iteration (s): 141.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.788951E-01 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.497 | TFLOPs: 147.99 | [default7]: iteration 121/ 3100 | consumed samples: 247808 | consumed tokens: 507510784 | elapsed time per iteration (s): 141.02 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.841722E-01 | grad norm: 0.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.522 | TFLOPs: 148.25 | [default7]: iteration 122/ 3100 | consumed samples: 249856 | consumed tokens: 511705088 | elapsed time per iteration (s): 142.20 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.755765E-01 | grad norm: 0.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.403 | TFLOPs: 147.03 | [default7]: iteration 123/ 3100 | consumed samples: 251904 | consumed tokens: 515899392 | elapsed time per iteration (s): 141.57 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.878202E-01 | grad norm: 0.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.466 | TFLOPs: 147.68 | [default7]: iteration 124/ 3100 | consumed samples: 253952 | consumed tokens: 520093696 | elapsed time per iteration (s): 142.12 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.719796E-01 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.410 | TFLOPs: 147.10 | [default7]: iteration 125/ 3100 | consumed samples: 256000 | consumed tokens: 524288000 | elapsed time per iteration (s): 142.69 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.759538E-01 | grad norm: 0.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.353 | TFLOPs: 146.52 | [default7]: iteration 126/ 3100 | consumed samples: 258048 | consumed tokens: 528482304 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.725249E-01 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.82 | [default7]: iteration 127/ 3100 | consumed samples: 260096 | consumed tokens: 532676608 | elapsed time per iteration (s): 142.26 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.779995E-01 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.396 | TFLOPs: 146.96 | [default7]: iteration 128/ 3100 | consumed samples: 262144 | consumed tokens: 536870912 | elapsed time per iteration (s): 142.21 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.826483E-01 | grad norm: 0.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.401 | TFLOPs: 147.01 | [default7]: iteration 129/ 3100 | consumed samples: 264192 | consumed tokens: 541065216 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.692418E-01 | grad norm: 0.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 130/ 3100 | consumed samples: 266240 | consumed tokens: 545259520 | elapsed time per iteration (s): 142.62 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.649554E-01 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.360 | TFLOPs: 146.59 | [default7]: iteration 131/ 3100 | consumed samples: 268288 | consumed tokens: 549453824 | elapsed time per iteration (s): 142.92 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.738269E-01 | grad norm: 0.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.330 | TFLOPs: 146.29 | [default7]: iteration 132/ 3100 | consumed samples: 270336 | consumed tokens: 553648128 | elapsed time per iteration (s): 142.16 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.944962E-01 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.406 | TFLOPs: 147.06 | [default7]: iteration 133/ 3100 | consumed samples: 272384 | consumed tokens: 557842432 | elapsed time per iteration (s): 141.36 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.788242E-01 | grad norm: 0.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.487 | TFLOPs: 147.89 | [default7]: iteration 134/ 3100 | consumed samples: 274432 | consumed tokens: 562036736 | elapsed time per iteration (s): 142.53 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.809464E-01 | grad norm: 0.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.368 | TFLOPs: 146.68 | [default7]: iteration 135/ 3100 | consumed samples: 276480 | consumed tokens: 566231040 | elapsed time per iteration (s): 141.52 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.653691E-01 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.73 | [default7]: iteration 136/ 3100 | consumed samples: 278528 | consumed tokens: 570425344 | elapsed time per iteration (s): 141.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.711610E-01 | grad norm: 0.735 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.488 | TFLOPs: 147.90 | [default7]: iteration 137/ 3100 | consumed samples: 280576 | consumed tokens: 574619648 | elapsed time per iteration (s): 141.31 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.718136E-01 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.493 | TFLOPs: 147.95 | [default7]: iteration 138/ 3100 | consumed samples: 282624 | consumed tokens: 578813952 | elapsed time per iteration (s): 141.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.615272E-01 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.479 | TFLOPs: 147.81 | [default7]: iteration 139/ 3100 | consumed samples: 284672 | consumed tokens: 583008256 | elapsed time per iteration (s): 142.95 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.669709E-01 | grad norm: 0.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.327 | TFLOPs: 146.26 | [default7]: iteration 140/ 3100 | consumed samples: 286720 | consumed tokens: 587202560 | elapsed time per iteration (s): 141.21 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.722347E-01 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.503 | TFLOPs: 148.05 | [default7]: iteration 141/ 3100 | consumed samples: 288768 | consumed tokens: 591396864 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.675107E-01 | grad norm: 0.676 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.98 | [default7]: iteration 142/ 3100 | consumed samples: 290816 | consumed tokens: 595591168 | elapsed time per iteration (s): 143.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.711003E-01 | grad norm: 0.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.294 | TFLOPs: 145.92 | [default7]: iteration 143/ 3100 | consumed samples: 292864 | consumed tokens: 599785472 | elapsed time per iteration (s): 141.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.610846E-01 | grad norm: 0.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.486 | TFLOPs: 147.88 | [default7]: iteration 144/ 3100 | consumed samples: 294912 | consumed tokens: 603979776 | elapsed time per iteration (s): 141.99 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.634090E-01 | grad norm: 0.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.424 | TFLOPs: 147.25 | [default7]: iteration 145/ 3100 | consumed samples: 296960 | consumed tokens: 608174080 | elapsed time per iteration (s): 142.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.612107E-01 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.390 | TFLOPs: 146.90 | [default7]: iteration 146/ 3100 | consumed samples: 299008 | consumed tokens: 612368384 | elapsed time per iteration (s): 142.71 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.703784E-01 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.351 | TFLOPs: 146.50 | [default7]: iteration 147/ 3100 | consumed samples: 301056 | consumed tokens: 616562688 | elapsed time per iteration (s): 141.09 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.674529E-01 | grad norm: 0.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.516 | TFLOPs: 148.18 | [default7]: iteration 148/ 3100 | consumed samples: 303104 | consumed tokens: 620756992 | elapsed time per iteration (s): 142.86 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.773187E-01 | grad norm: 0.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.335 | TFLOPs: 146.34 | [default7]: iteration 149/ 3100 | consumed samples: 305152 | consumed tokens: 624951296 | elapsed time per iteration (s): 142.66 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.745774E-01 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.356 | TFLOPs: 146.55 | [default7]: iteration 150/ 3100 | consumed samples: 307200 | consumed tokens: 629145600 | elapsed time per iteration (s): 141.81 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.738646E-01 | grad norm: 0.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.442 | TFLOPs: 147.43 | [default7]: iteration 151/ 3100 | consumed samples: 309248 | consumed tokens: 633339904 | elapsed time per iteration (s): 141.07 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.684091E-01 | grad norm: 0.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.518 | TFLOPs: 148.20 | [default7]: iteration 152/ 3100 | consumed samples: 311296 | consumed tokens: 637534208 | elapsed time per iteration (s): 142.04 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.656068E-01 | grad norm: 0.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.418 | TFLOPs: 147.19 | [default7]: iteration 153/ 3100 | consumed samples: 313344 | consumed tokens: 641728512 | elapsed time per iteration (s): 141.18 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.597326E-01 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.506 | TFLOPs: 148.08 | [default7]: iteration 154/ 3100 | consumed samples: 315392 | consumed tokens: 645922816 | elapsed time per iteration (s): 141.22 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.694619E-01 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.502 | TFLOPs: 148.05 | [default7]: iteration 155/ 3100 | consumed samples: 317440 | consumed tokens: 650117120 | elapsed time per iteration (s): 141.34 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.686182E-01 | grad norm: 0.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.490 | TFLOPs: 147.92 | [default7]: iteration 156/ 3100 | consumed samples: 319488 | consumed tokens: 654311424 | elapsed time per iteration (s): 141.50 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.609651E-01 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.474 | TFLOPs: 147.76 | [default7]: iteration 157/ 3100 | consumed samples: 321536 | consumed tokens: 658505728 | elapsed time per iteration (s): 140.97 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.687255E-01 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.528 | TFLOPs: 148.30 | [default7]: iteration 158/ 3100 | consumed samples: 323584 | consumed tokens: 662700032 | elapsed time per iteration (s): 143.04 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.731535E-01 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.317 | TFLOPs: 146.16 | [default7]: iteration 159/ 3100 | consumed samples: 325632 | consumed tokens: 666894336 | elapsed time per iteration (s): 141.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.623107E-01 | grad norm: 0.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.492 | TFLOPs: 147.94 | [default7]: iteration 160/ 3100 | consumed samples: 327680 | consumed tokens: 671088640 | elapsed time per iteration (s): 142.29 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.640601E-01 | grad norm: 0.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.393 | TFLOPs: 146.93 | [default7]: iteration 161/ 3100 | consumed samples: 329728 | consumed tokens: 675282944 | elapsed time per iteration (s): 142.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.660369E-01 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.384 | TFLOPs: 146.84 | [default7]: iteration 162/ 3100 | consumed samples: 331776 | consumed tokens: 679477248 | elapsed time per iteration (s): 141.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.504568E-01 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.487 | TFLOPs: 147.89 | [default7]: iteration 163/ 3100 | consumed samples: 333824 | consumed tokens: 683671552 | elapsed time per iteration (s): 141.53 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.689477E-01 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.470 | TFLOPs: 147.72 | [default7]: iteration 164/ 3100 | consumed samples: 335872 | consumed tokens: 687865856 | elapsed time per iteration (s): 142.19 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.638633E-01 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.403 | TFLOPs: 147.03 | [default7]: iteration 165/ 3100 | consumed samples: 337920 | consumed tokens: 692060160 | elapsed time per iteration (s): 141.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.631855E-01 | grad norm: 0.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.501 | TFLOPs: 148.03 | [default7]: iteration 166/ 3100 | consumed samples: 339968 | consumed tokens: 696254464 | elapsed time per iteration (s): 141.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.548810E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.479 | TFLOPs: 147.80 | [default7]: iteration 167/ 3100 | consumed samples: 342016 | consumed tokens: 700448768 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.618242E-01 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.477 | TFLOPs: 147.79 | [default7]: iteration 168/ 3100 | consumed samples: 344064 | consumed tokens: 704643072 | elapsed time per iteration (s): 143.01 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.542757E-01 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.321 | TFLOPs: 146.19 | [default7]: iteration 169/ 3100 | consumed samples: 346112 | consumed tokens: 708837376 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.527310E-01 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.01 | [default7]: iteration 170/ 3100 | consumed samples: 348160 | consumed tokens: 713031680 | elapsed time per iteration (s): 141.21 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.574461E-01 | grad norm: 0.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.503 | TFLOPs: 148.05 | [default7]: iteration 171/ 3100 | consumed samples: 350208 | consumed tokens: 717225984 | elapsed time per iteration (s): 142.77 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.580831E-01 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.345 | TFLOPs: 146.44 | [default7]: iteration 172/ 3100 | consumed samples: 352256 | consumed tokens: 721420288 | elapsed time per iteration (s): 141.77 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.600464E-01 | grad norm: 0.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.446 | TFLOPs: 147.47 | [default7]: iteration 173/ 3100 | consumed samples: 354304 | consumed tokens: 725614592 | elapsed time per iteration (s): 142.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.626750E-01 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.334 | TFLOPs: 146.33 | [default7]: iteration 174/ 3100 | consumed samples: 356352 | consumed tokens: 729808896 | elapsed time per iteration (s): 142.94 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.496827E-01 | grad norm: 0.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.328 | TFLOPs: 146.27 | [default7]: iteration 175/ 3100 | consumed samples: 358400 | consumed tokens: 734003200 | elapsed time per iteration (s): 140.95 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.639470E-01 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.530 | TFLOPs: 148.33 | [default7]: iteration 176/ 3100 | consumed samples: 360448 | consumed tokens: 738197504 | elapsed time per iteration (s): 141.24 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.449614E-01 | grad norm: 0.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.501 | TFLOPs: 148.03 | [default7]: iteration 177/ 3100 | consumed samples: 362496 | consumed tokens: 742391808 | elapsed time per iteration (s): 143.04 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.614498E-01 | grad norm: 0.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.318 | TFLOPs: 146.16 | [default7]: iteration 178/ 3100 | consumed samples: 364544 | consumed tokens: 746586112 | elapsed time per iteration (s): 142.98 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.537661E-01 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.323 | TFLOPs: 146.22 | [default7]: iteration 179/ 3100 | consumed samples: 366592 | consumed tokens: 750780416 | elapsed time per iteration (s): 141.00 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.570555E-01 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.525 | TFLOPs: 148.28 | [default7]: iteration 180/ 3100 | consumed samples: 368640 | consumed tokens: 754974720 | elapsed time per iteration (s): 142.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.637653E-01 | grad norm: 0.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.379 | TFLOPs: 146.78 | [default7]: iteration 181/ 3100 | consumed samples: 370688 | consumed tokens: 759169024 | elapsed time per iteration (s): 141.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.513478E-01 | grad norm: 0.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.492 | TFLOPs: 147.94 | [default7]: iteration 182/ 3100 | consumed samples: 372736 | consumed tokens: 763363328 | elapsed time per iteration (s): 142.24 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.590591E-01 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.398 | TFLOPs: 146.98 | [default7]: iteration 183/ 3100 | consumed samples: 374784 | consumed tokens: 767557632 | elapsed time per iteration (s): 142.31 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.524537E-01 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.392 | TFLOPs: 146.92 | [default7]: iteration 184/ 3100 | consumed samples: 376832 | consumed tokens: 771751936 | elapsed time per iteration (s): 142.55 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.525125E-01 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.367 | TFLOPs: 146.67 | [default7]: iteration 185/ 3100 | consumed samples: 378880 | consumed tokens: 775946240 | elapsed time per iteration (s): 141.78 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.526459E-01 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.445 | TFLOPs: 147.46 | [default7]: iteration 186/ 3100 | consumed samples: 380928 | consumed tokens: 780140544 | elapsed time per iteration (s): 141.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.588501E-01 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.483 | TFLOPs: 147.85 | [default7]: iteration 187/ 3100 | consumed samples: 382976 | consumed tokens: 784334848 | elapsed time per iteration (s): 141.82 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.537382E-01 | grad norm: 0.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.441 | TFLOPs: 147.42 | [default7]: iteration 188/ 3100 | consumed samples: 385024 | consumed tokens: 788529152 | elapsed time per iteration (s): 141.19 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.539912E-01 | grad norm: 0.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.506 | TFLOPs: 148.08 | [default7]: iteration 189/ 3100 | consumed samples: 387072 | consumed tokens: 792723456 | elapsed time per iteration (s): 140.89 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.562173E-01 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.536 | TFLOPs: 148.39 | [default7]: iteration 190/ 3100 | consumed samples: 389120 | consumed tokens: 796917760 | elapsed time per iteration (s): 141.10 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.329869E-01 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.515 | TFLOPs: 148.17 | [default7]: iteration 191/ 3100 | consumed samples: 391168 | consumed tokens: 801112064 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.529922E-01 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.74 | [default7]: iteration 192/ 3100 | consumed samples: 393216 | consumed tokens: 805306368 | elapsed time per iteration (s): 142.04 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.529367E-01 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.418 | TFLOPs: 147.19 | [default7]: iteration 193/ 3100 | consumed samples: 395264 | consumed tokens: 809500672 | elapsed time per iteration (s): 142.84 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.559803E-01 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.338 | TFLOPs: 146.37 | [default7]: iteration 194/ 3100 | consumed samples: 397312 | consumed tokens: 813694976 | elapsed time per iteration (s): 141.05 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.391642E-01 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.520 | TFLOPs: 148.22 | [default7]: iteration 195/ 3100 | consumed samples: 399360 | consumed tokens: 817889280 | elapsed time per iteration (s): 142.24 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.521845E-01 | grad norm: 0.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.399 | TFLOPs: 146.99 | [default7]: iteration 196/ 3100 | consumed samples: 401408 | consumed tokens: 822083584 | elapsed time per iteration (s): 141.06 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.553992E-01 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.519 | TFLOPs: 148.22 | [default7]: iteration 197/ 3100 | consumed samples: 403456 | consumed tokens: 826277888 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.532106E-01 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.01 | [default7]: iteration 198/ 3100 | consumed samples: 405504 | consumed tokens: 830472192 | elapsed time per iteration (s): 141.11 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.498717E-01 | grad norm: 0.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.513 | TFLOPs: 148.16 | [default7]: iteration 199/ 3100 | consumed samples: 407552 | consumed tokens: 834666496 | elapsed time per iteration (s): 141.20 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.545093E-01 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.504 | TFLOPs: 148.06 | [default7]: iteration 200/ 3100 | consumed samples: 409600 | consumed tokens: 838860800 | elapsed time per iteration (s): 141.44 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.526641E-01 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.479 | TFLOPs: 147.81 | [default7]: iteration 201/ 3100 | consumed samples: 411648 | consumed tokens: 843055104 | elapsed time per iteration (s): 142.56 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.453357E-01 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.366 | TFLOPs: 146.65 | [default7]: iteration 202/ 3100 | consumed samples: 413696 | consumed tokens: 847249408 | elapsed time per iteration (s): 142.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.418292E-01 | grad norm: 0.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.399 | TFLOPs: 146.99 | [default7]: iteration 203/ 3100 | consumed samples: 415744 | consumed tokens: 851443712 | elapsed time per iteration (s): 141.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.422912E-01 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.483 | TFLOPs: 147.84 | [default7]: iteration 204/ 3100 | consumed samples: 417792 | consumed tokens: 855638016 | elapsed time per iteration (s): 142.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.461519E-01 | grad norm: 0.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.334 | TFLOPs: 146.33 | [default7]: iteration 205/ 3100 | consumed samples: 419840 | consumed tokens: 859832320 | elapsed time per iteration (s): 142.91 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.423598E-01 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.330 | TFLOPs: 146.29 | [default7]: iteration 206/ 3100 | consumed samples: 421888 | consumed tokens: 864026624 | elapsed time per iteration (s): 142.07 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.385181E-01 | grad norm: 0.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.416 | TFLOPs: 147.16 | [default7]: iteration 207/ 3100 | consumed samples: 423936 | consumed tokens: 868220928 | elapsed time per iteration (s): 142.16 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.455391E-01 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.407 | TFLOPs: 147.07 | [default7]: iteration 208/ 3100 | consumed samples: 425984 | consumed tokens: 872415232 | elapsed time per iteration (s): 142.20 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.441313E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.402 | TFLOPs: 147.02 | [default7]: iteration 209/ 3100 | consumed samples: 428032 | consumed tokens: 876609536 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.367626E-01 | grad norm: 0.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.99 | [default7]: iteration 210/ 3100 | consumed samples: 430080 | consumed tokens: 880803840 | elapsed time per iteration (s): 142.89 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.407150E-01 | grad norm: 0.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.333 | TFLOPs: 146.32 | [default7]: iteration 211/ 3100 | consumed samples: 432128 | consumed tokens: 884998144 | elapsed time per iteration (s): 142.24 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.439035E-01 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.398 | TFLOPs: 146.98 | [default7]: iteration 212/ 3100 | consumed samples: 434176 | consumed tokens: 889192448 | elapsed time per iteration (s): 143.00 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.516332E-01 | grad norm: 0.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.322 | TFLOPs: 146.20 | [default7]: iteration 213/ 3100 | consumed samples: 436224 | consumed tokens: 893386752 | elapsed time per iteration (s): 142.73 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.396359E-01 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.349 | TFLOPs: 146.48 | [default7]: iteration 214/ 3100 | consumed samples: 438272 | consumed tokens: 897581056 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.423021E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.74 | [default7]: iteration 215/ 3100 | consumed samples: 440320 | consumed tokens: 901775360 | elapsed time per iteration (s): 141.06 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.260318E-01 | grad norm: 0.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.518 | TFLOPs: 148.21 | [default7]: iteration 216/ 3100 | consumed samples: 442368 | consumed tokens: 905969664 | elapsed time per iteration (s): 141.76 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.356900E-01 | grad norm: 0.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.447 | TFLOPs: 147.49 | [default7]: iteration 217/ 3100 | consumed samples: 444416 | consumed tokens: 910163968 | elapsed time per iteration (s): 142.91 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.392973E-01 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.330 | TFLOPs: 146.29 | [default7]: iteration 218/ 3100 | consumed samples: 446464 | consumed tokens: 914358272 | elapsed time per iteration (s): 141.21 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.448015E-01 | grad norm: 0.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.504 | TFLOPs: 148.06 | [default7]: iteration 219/ 3100 | consumed samples: 448512 | consumed tokens: 918552576 | elapsed time per iteration (s): 141.34 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.333312E-01 | grad norm: 0.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.490 | TFLOPs: 147.92 | [default7]: iteration 220/ 3100 | consumed samples: 450560 | consumed tokens: 922746880 | elapsed time per iteration (s): 140.93 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.499141E-01 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.533 | TFLOPs: 148.35 | [default7]: iteration 221/ 3100 | consumed samples: 452608 | consumed tokens: 926941184 | elapsed time per iteration (s): 141.22 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.468154E-01 | grad norm: 0.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.502 | TFLOPs: 148.04 | [default7]: iteration 222/ 3100 | consumed samples: 454656 | consumed tokens: 931135488 | elapsed time per iteration (s): 141.80 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.337772E-01 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.443 | TFLOPs: 147.44 | [default7]: iteration 223/ 3100 | consumed samples: 456704 | consumed tokens: 935329792 | elapsed time per iteration (s): 141.13 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.438694E-01 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.511 | TFLOPs: 148.14 | [default7]: iteration 224/ 3100 | consumed samples: 458752 | consumed tokens: 939524096 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.562556E-01 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.74 | [default7]: iteration 225/ 3100 | consumed samples: 460800 | consumed tokens: 943718400 | elapsed time per iteration (s): 141.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.271951E-01 | grad norm: 0.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.484 | TFLOPs: 147.86 | [default7]: iteration 226/ 3100 | consumed samples: 462848 | consumed tokens: 947912704 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.273087E-01 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.98 | [default7]: iteration 227/ 3100 | consumed samples: 464896 | consumed tokens: 952107008 | elapsed time per iteration (s): 141.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.463019E-01 | grad norm: 0.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.435 | TFLOPs: 147.35 | [default7]: iteration 228/ 3100 | consumed samples: 466944 | consumed tokens: 956301312 | elapsed time per iteration (s): 141.62 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.351043E-01 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.461 | TFLOPs: 147.63 | [default7]: iteration 229/ 3100 | consumed samples: 468992 | consumed tokens: 960495616 | elapsed time per iteration (s): 142.36 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.384478E-01 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.386 | TFLOPs: 146.86 | [default7]: iteration 230/ 3100 | consumed samples: 471040 | consumed tokens: 964689920 | elapsed time per iteration (s): 141.94 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.329138E-01 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.429 | TFLOPs: 147.30 | [default7]: iteration 231/ 3100 | consumed samples: 473088 | consumed tokens: 968884224 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.376148E-01 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.98 | [default7]: iteration 232/ 3100 | consumed samples: 475136 | consumed tokens: 973078528 | elapsed time per iteration (s): 141.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.360265E-01 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.483 | TFLOPs: 147.85 | [default7]: iteration 233/ 3100 | consumed samples: 477184 | consumed tokens: 977272832 | elapsed time per iteration (s): 141.15 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.272632E-01 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.509 | TFLOPs: 148.12 | [default7]: iteration 234/ 3100 | consumed samples: 479232 | consumed tokens: 981467136 | elapsed time per iteration (s): 141.26 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.362374E-01 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.498 | TFLOPs: 148.00 | [default7]: iteration 235/ 3100 | consumed samples: 481280 | consumed tokens: 985661440 | elapsed time per iteration (s): 142.71 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.367625E-01 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.351 | TFLOPs: 146.50 | [default7]: iteration 236/ 3100 | consumed samples: 483328 | consumed tokens: 989855744 | elapsed time per iteration (s): 142.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.299906E-01 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.385 | TFLOPs: 146.84 | [default7]: iteration 237/ 3100 | consumed samples: 485376 | consumed tokens: 994050048 | elapsed time per iteration (s): 143.07 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.184098E-01 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.314 | TFLOPs: 146.13 | [default7]: iteration 238/ 3100 | consumed samples: 487424 | consumed tokens: 998244352 | elapsed time per iteration (s): 142.26 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.257406E-01 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.396 | TFLOPs: 146.96 | [default7]: iteration 239/ 3100 | consumed samples: 489472 | consumed tokens: 1002438656 | elapsed time per iteration (s): 142.91 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.294320E-01 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.331 | TFLOPs: 146.30 | [default7]: iteration 240/ 3100 | consumed samples: 491520 | consumed tokens: 1006632960 | elapsed time per iteration (s): 143.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.373648E-01 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.287 | TFLOPs: 145.85 | [default7]: iteration 241/ 3100 | consumed samples: 493568 | consumed tokens: 1010827264 | elapsed time per iteration (s): 142.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.287533E-01 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.389 | TFLOPs: 146.89 | [default7]: iteration 242/ 3100 | consumed samples: 495616 | consumed tokens: 1015021568 | elapsed time per iteration (s): 141.29 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.283406E-01 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.495 | TFLOPs: 147.98 | [default7]: iteration 243/ 3100 | consumed samples: 497664 | consumed tokens: 1019215872 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.309920E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.478 | TFLOPs: 147.80 | [default7]: iteration 244/ 3100 | consumed samples: 499712 | consumed tokens: 1023410176 | elapsed time per iteration (s): 142.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.326167E-01 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.384 | TFLOPs: 146.84 | [default7]: iteration 245/ 3100 | consumed samples: 501760 | consumed tokens: 1027604480 | elapsed time per iteration (s): 143.18 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.244620E-01 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.304 | TFLOPs: 146.02 | [default7]: iteration 246/ 3100 | consumed samples: 503808 | consumed tokens: 1031798784 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.312448E-01 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.478 | TFLOPs: 147.80 | [default7]: iteration 247/ 3100 | consumed samples: 505856 | consumed tokens: 1035993088 | elapsed time per iteration (s): 142.73 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.288952E-01 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.349 | TFLOPs: 146.48 | [default7]: iteration 248/ 3100 | consumed samples: 507904 | consumed tokens: 1040187392 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.249099E-01 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default0]:saving checkpoint at iteration 249 to /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]:[2022-09-04 05:42:12,931] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step249 is begin to save! [default0]:[2022-09-04 05:42:13,011] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_24-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_10-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_31-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_72-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_28-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,012] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_08-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_16-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_54-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_20-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_55-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_38-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_45-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_70-model_00-model_states.pt... [default7]: iteration 249/ 3100 | consumed samples: 509952 | consumed tokens: 1044381696 | elapsed time per iteration (s): 142.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.270842E-01 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.394 | TFLOPs: 146.94 | [default4]:[2022-09-04 05:42:13,012] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_07-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_50-model_00-model_states.pt... [default4]:[2022-09-04 05:42:12,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_63-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_64-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_61-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,014] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_56-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_58-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_44-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_36-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_69-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_52-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_17-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_32-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_43-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_42-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,012] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_06-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,011] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_25-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,012] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_15-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,012] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_09-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_13-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_22-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_23-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,012] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_41-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_59-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_26-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_49-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_37-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,011] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_66-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_18-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_53-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_05-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_65-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_62-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_71_model_states.pt... [default4]:[2022-09-04 05:42:13,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_71_model_states.pt. [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_48-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_03-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_35-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,012] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_40-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_34-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_47-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_57-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_39-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_71-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,012] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_67-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_33-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_60-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_30-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_21-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_11-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_19-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_68-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,012] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_14-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_29-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,014] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_12-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_27-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_01-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_04-model_00-model_states.pt... [default0]:[2022-09-04 05:42:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_46-model_00-model_states.pt... [default4]:[2022-09-04 05:42:13,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_51-model_00-model_states.pt... [default0]:[2022-09-04 05:42:16,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_72-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,058] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_74-model_00-model_states.pt... [default0]:[2022-09-04 05:42:16,060] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_74-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_70_model_states.pt... [default0]:[2022-09-04 05:42:16,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_70_model_states.pt. [default0]:[2022-09-04 05:42:16,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_28-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,266] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_26_model_states.pt... [default0]:[2022-09-04 05:42:16,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_26_model_states.pt. [default0]:[2022-09-04 05:42:16,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_22-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,274] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_20_model_states.pt... [default0]:[2022-09-04 05:42:16,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_20_model_states.pt. [default0]:[2022-09-04 05:42:16,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_68-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_66_model_states.pt... [default0]:[2022-09-04 05:42:16,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_66_model_states.pt. [default4]:[2022-09-04 05:42:16,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_29-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,367] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_27_model_states.pt... [default4]:[2022-09-04 05:42:16,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_27_model_states.pt. [default0]:[2022-09-04 05:42:16,372] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_04-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_02_model_states.pt... [default0]:[2022-09-04 05:42:16,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_02_model_states.pt. [default0]:[2022-09-04 05:42:16,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_36-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,445] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_34_model_states.pt... [default0]:[2022-09-04 05:42:16,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_32-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,382] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_30_model_states.pt... [default0]:[2022-09-04 05:42:16,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_30_model_states.pt. [default4]:[2022-09-04 05:42:16,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_43-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,397] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_41_model_states.pt... [default4]:[2022-09-04 05:42:16,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_41_model_states.pt. [default0]:[2022-09-04 05:42:16,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_06-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,424] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_04_model_states.pt... [default0]:[2022-09-04 05:42:16,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_04_model_states.pt. [default4]:[2022-09-04 05:42:16,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_13-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_11_model_states.pt... [default4]:[2022-09-04 05:42:16,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_11_model_states.pt. [default4]:[2022-09-04 05:42:16,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_05-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,414] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_03_model_states.pt... [default4]:[2022-09-04 05:42:16,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_03_model_states.pt. [default0]:[2022-09-04 05:42:16,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_40-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_38_model_states.pt... [default4]:[2022-09-04 05:42:16,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_67-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_65_model_states.pt... [default4]:[2022-09-04 05:42:16,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_65_model_states.pt. [default0]:[2022-09-04 05:42:16,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_14-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_12_model_states.pt... [default0]:[2022-09-04 05:42:16,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_12_model_states.pt. [default0]:[2022-09-04 05:42:16,459] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_24-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_22_model_states.pt... [default0]:[2022-09-04 05:42:16,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_22_model_states.pt. [default0]:[2022-09-04 05:42:16,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_16-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,508] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_14_model_states.pt... [default0]:[2022-09-04 05:42:16,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_14_model_states.pt. [default0]:[2022-09-04 05:42:16,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_20-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_18_model_states.pt... [default0]:[2022-09-04 05:42:16,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_18_model_states.pt. [default4]:[2022-09-04 05:42:16,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_61-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,467] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_59_model_states.pt... [default4]:[2022-09-04 05:42:16,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_59_model_states.pt. [default0]:[2022-09-04 05:42:16,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_34_model_states.pt. [default4]:[2022-09-04 05:42:16,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_25-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,500] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_23_model_states.pt... [default4]:[2022-09-04 05:42:16,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_23_model_states.pt. [default4]:[2022-09-04 05:42:16,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_23-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,487] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_21_model_states.pt... [default4]:[2022-09-04 05:42:16,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_21_model_states.pt. [default4]:[2022-09-04 05:42:16,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_41-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_39_model_states.pt... [default4]:[2022-09-04 05:42:16,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_37-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_35_model_states.pt... [default4]:[2022-09-04 05:42:16,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_35_model_states.pt. [default0]:[2022-09-04 05:42:16,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_38_model_states.pt. [default4]:[2022-09-04 05:42:16,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_21-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,579] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_19_model_states.pt... [default4]:[2022-09-04 05:42:16,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_19_model_states.pt. [default0]:[2022-09-04 05:42:16,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_12-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,574] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_10_model_states.pt... [default0]:[2022-09-04 05:42:16,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_10_model_states.pt. [default0]:[2022-09-04 05:42:16,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_08-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_06_model_states.pt... [default0]:[2022-09-04 05:42:16,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_06_model_states.pt. [default4]:[2022-09-04 05:42:16,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_07-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,625] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_05_model_states.pt... [default4]:[2022-09-04 05:42:16,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_05_model_states.pt. [default4]:[2022-09-04 05:42:16,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_69-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,555] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_67_model_states.pt... [default4]:[2022-09-04 05:42:16,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_67_model_states.pt. [default0]:[2022-09-04 05:42:16,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_42-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,628] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_40_model_states.pt... [default0]:[2022-09-04 05:42:16,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_40_model_states.pt. [default4]:[2022-09-04 05:42:16,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_15-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_13_model_states.pt... [default4]:[2022-09-04 05:42:16,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_13_model_states.pt. [default4]:[2022-09-04 05:42:16,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_09-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,615] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_07_model_states.pt... [default4]:[2022-09-04 05:42:16,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_07_model_states.pt. [default4]:[2022-09-04 05:42:16,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_39_model_states.pt. [default0]:[2022-09-04 05:42:16,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_66-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,646] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_64_model_states.pt... [default0]:[2022-09-04 05:42:16,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_64_model_states.pt. [default4]:[2022-09-04 05:42:16,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_65-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_63_model_states.pt... [default4]:[2022-09-04 05:42:16,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_53-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,588] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_51_model_states.pt... [default4]:[2022-09-04 05:42:16,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_51_model_states.pt. [default4]:[2022-09-04 05:42:16,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_33-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_31_model_states.pt... [default4]:[2022-09-04 05:42:16,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_31_model_states.pt. [default0]:[2022-09-04 05:42:16,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_60-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,688] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_58_model_states.pt... [default0]:[2022-09-04 05:42:16,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_58_model_states.pt. [default4]:[2022-09-04 05:42:16,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_11-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_09_model_states.pt... [default4]:[2022-09-04 05:42:16,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_09_model_states.pt. [default0]:[2022-09-04 05:42:16,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_10-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,646] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_08_model_states.pt... [default0]:[2022-09-04 05:42:16,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_08_model_states.pt. [default4]:[2022-09-04 05:42:16,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_31-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,659] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_29_model_states.pt... [default4]:[2022-09-04 05:42:16,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_29_model_states.pt. [default4]:[2022-09-04 05:42:16,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_45-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_43_model_states.pt... [default0]:[2022-09-04 05:42:16,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_70-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,679] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_68_model_states.pt... [default0]:[2022-09-04 05:42:16,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_68_model_states.pt. [default0]:[2022-09-04 05:42:16,652] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_64-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,653] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_62_model_states.pt... [default0]:[2022-09-04 05:42:16,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_62_model_states.pt. [default0]:[2022-09-04 05:42:16,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_44-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,737] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_42_model_states.pt... [default0]:[2022-09-04 05:42:16,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_42_model_states.pt. [default4]:[2022-09-04 05:42:16,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_17-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_15_model_states.pt... [default4]:[2022-09-04 05:42:16,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_15_model_states.pt. [default0]:[2022-09-04 05:42:16,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_18-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,720] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_16_model_states.pt... [default0]:[2022-09-04 05:42:16,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_16_model_states.pt. [default4]:[2022-09-04 05:42:16,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_63_model_states.pt. [default4]:[2022-09-04 05:42:16,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_39-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,696] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_37_model_states.pt... [default4]:[2022-09-04 05:42:16,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_37_model_states.pt. [default0]:[2022-09-04 05:42:16,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_30-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_28_model_states.pt... [default0]:[2022-09-04 05:42:16,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_28_model_states.pt. [default4]:[2022-09-04 05:42:16,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_19-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_17_model_states.pt... [default4]:[2022-09-04 05:42:16,777] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_17_model_states.pt. [default0]:[2022-09-04 05:42:16,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_46-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,801] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_44_model_states.pt... [default0]:[2022-09-04 05:42:16,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_44_model_states.pt. [default0]:[2022-09-04 05:42:16,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_54-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,782] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_52_model_states.pt... [default0]:[2022-09-04 05:42:16,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_52_model_states.pt. [default4]:[2022-09-04 05:42:16,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_43_model_states.pt. [default0]:[2022-09-04 05:42:16,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_38-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,730] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_36_model_states.pt... [default0]:[2022-09-04 05:42:16,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_36_model_states.pt. [default0]:[2022-09-04 05:42:16,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_58-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,832] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_56_model_states.pt... [default0]:[2022-09-04 05:42:16,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_56_model_states.pt. [default0]:[2022-09-04 05:42:16,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_52-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,768] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_50_model_states.pt... [default0]:[2022-09-04 05:42:16,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_50_model_states.pt. [default4]:[2022-09-04 05:42:16,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_59-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,833] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_57_model_states.pt... [default4]:[2022-09-04 05:42:16,838] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_57_model_states.pt. [default0]:[2022-09-04 05:42:16,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_26-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,792] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_24_model_states.pt... [default0]:[2022-09-04 05:42:16,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_24_model_states.pt. [default0]:[2022-09-04 05:42:16,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_62-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_60_model_states.pt... [default0]:[2022-09-04 05:42:16,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_60_model_states.pt. [default4]:[2022-09-04 05:42:16,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_35-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,878] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_33_model_states.pt... [default4]:[2022-09-04 05:42:16,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_33_model_states.pt. [default0]:[2022-09-04 05:42:16,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_34-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,844] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_32_model_states.pt... [default0]:[2022-09-04 05:42:16,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_32_model_states.pt. [default4]:[2022-09-04 05:42:16,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_47-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_45_model_states.pt... [default4]:[2022-09-04 05:42:16,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_45_model_states.pt. [default4]:[2022-09-04 05:42:16,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_71-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,803] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_69_model_states.pt... [default4]:[2022-09-04 05:42:16,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_69_model_states.pt. [default4]:[2022-09-04 05:42:16,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_27-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_25_model_states.pt... [default4]:[2022-09-04 05:42:16,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_25_model_states.pt. [default4]:[2022-09-04 05:42:16,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_55-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,828] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_53_model_states.pt... [default4]:[2022-09-04 05:42:16,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_53_model_states.pt. [default4]:[2022-09-04 05:42:16,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_63-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_61_model_states.pt... [default4]:[2022-09-04 05:42:16,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_61_model_states.pt. [default4]:[2022-09-04 05:42:16,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_49-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,939] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_47_model_states.pt... [default4]:[2022-09-04 05:42:16,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_47_model_states.pt. [default4]:[2022-09-04 05:42:16,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_57-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,975] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_55_model_states.pt... [default4]:[2022-09-04 05:42:16,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_55_model_states.pt. [default4]:[2022-09-04 05:42:16,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_51-model_00-model_states.pt. [default4]:[2022-09-04 05:42:16,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_49_model_states.pt... [default4]:[2022-09-04 05:42:16,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_49_model_states.pt. [default0]:[2022-09-04 05:42:16,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_50-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_48_model_states.pt... [default0]:[2022-09-04 05:42:16,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_48_model_states.pt. [default0]:[2022-09-04 05:42:16,986] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_48-model_00-model_states.pt. [default0]:[2022-09-04 05:42:16,986] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_46_model_states.pt... [default0]:[2022-09-04 05:42:16,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_46_model_states.pt. [default4]:[2022-09-04 05:42:17,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_03-model_00-model_states.pt. [default4]:[2022-09-04 05:42:17,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_01_model_states.pt... [default4]:[2022-09-04 05:42:17,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_01_model_states.pt. [default0]:[2022-09-04 05:42:17,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_56-model_00-model_states.pt. [default0]:[2022-09-04 05:42:17,281] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_54_model_states.pt... [default0]:[2022-09-04 05:42:17,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_54_model_states.pt. [default0]:[2022-09-04 05:42:17,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/layer_01-model_00-model_states.pt. [default0]:[2022-09-04 05:42:17,902] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_00_model_states.pt [default0]:[2022-09-04 05:42:17,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_00_model_states.pt... [default0]:[2022-09-04 05:42:17,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/mp_rank_00_model_states.pt. [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt... [default2]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt... [default5]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt... [default4]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt... [default3]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt... [default1]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... [default0]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [default6]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt... [default7]:[2022-09-04 05:42:18,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt... [default1]:[2022-09-04 05:42:25,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt. [default1]:[2022-09-04 05:42:25,107] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt [default0]:[2022-09-04 05:42:25,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt. [default0]:[2022-09-04 05:42:25,200] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt [default3]:[2022-09-04 05:42:25,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt. [default3]:[2022-09-04 05:42:25,248] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt [default6]:[2022-09-04 05:42:25,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt. [default6]:[2022-09-04 05:42:25,343] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt [default0]:[2022-09-04 05:42:25,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt. [default0]:[2022-09-04 05:42:25,417] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt [default5]:[2022-09-04 05:42:25,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt. [default5]:[2022-09-04 05:42:25,448] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt [default1]:[2022-09-04 05:42:25,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt. [default1]:[2022-09-04 05:42:25,443] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt [default2]:[2022-09-04 05:42:25,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt. [default2]:[2022-09-04 05:42:25,568] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt [default7]:[2022-09-04 05:42:25,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt. [default7]:[2022-09-04 05:42:25,599] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt [default0]:[2022-09-04 05:42:25,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt. [default0]:[2022-09-04 05:42:25,712] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt [default4]:[2022-09-04 05:42:25,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt. [default4]:[2022-09-04 05:42:25,707] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt [default1]:[2022-09-04 05:42:25,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt. [default1]:[2022-09-04 05:42:25,721] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt [default7]:[2022-09-04 05:42:25,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt. [default7]:[2022-09-04 05:42:25,743] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt [default5]:[2022-09-04 05:42:25,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt. [default5]:[2022-09-04 05:42:25,819] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt [default2]:[2022-09-04 05:42:25,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt. [default2]:[2022-09-04 05:42:25,863] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt [default4]:[2022-09-04 05:42:25,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt. [default4]:[2022-09-04 05:42:25,844] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt [default0]:[2022-09-04 05:42:25,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt. [default0]:[2022-09-04 05:42:25,794] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt [default3]:[2022-09-04 05:42:25,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt. [default3]:[2022-09-04 05:42:25,810] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt [default5]:[2022-09-04 05:42:25,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt. [default5]:[2022-09-04 05:42:25,870] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt [default2]:[2022-09-04 05:42:25,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt. [default2]:[2022-09-04 05:42:25,862] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt [default1]:[2022-09-04 05:42:25,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt. [default1]:[2022-09-04 05:42:25,858] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt [default6]:[2022-09-04 05:42:25,941] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt. [default6]:[2022-09-04 05:42:25,941] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt [default3]:[2022-09-04 05:42:25,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt. [default3]:[2022-09-04 05:42:25,946] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt [default0]:[2022-09-04 05:42:26,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt. [default0]:[2022-09-04 05:42:26,032] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt [default2]:[2022-09-04 05:42:26,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt. [default2]:[2022-09-04 05:42:26,000] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt [default6]:[2022-09-04 05:42:26,080] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt. [default6]:[2022-09-04 05:42:26,080] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt [default6]:[2022-09-04 05:42:26,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt. [default6]:[2022-09-04 05:42:26,109] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt [default3]:[2022-09-04 05:42:26,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt. [default3]:[2022-09-04 05:42:26,113] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt [default7]:[2022-09-04 05:42:26,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt. [default7]:[2022-09-04 05:42:26,097] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt [default2]:[2022-09-04 05:42:26,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt. [default2]:[2022-09-04 05:42:26,076] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt [default4]:[2022-09-04 05:42:26,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt. [default4]:[2022-09-04 05:42:26,055] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt [default7]:[2022-09-04 05:42:26,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt. [default7]:[2022-09-04 05:42:26,118] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt [default7]:[2022-09-04 05:42:26,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt. [default7]:[2022-09-04 05:42:26,104] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt [default1]:[2022-09-04 05:42:26,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt. [default1]:[2022-09-04 05:42:26,157] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt [default2]:[2022-09-04 05:42:26,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt. [default2]:[2022-09-04 05:42:26,246] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt [default7]:[2022-09-04 05:42:26,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt. [default7]:[2022-09-04 05:42:26,276] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt [default5]:[2022-09-04 05:42:26,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt. [default5]:[2022-09-04 05:42:26,314] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt [default3]:[2022-09-04 05:42:26,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt. [default3]:[2022-09-04 05:42:26,234] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt [default7]:[2022-09-04 05:42:26,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt. [default7]:[2022-09-04 05:42:26,260] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt [default1]:[2022-09-04 05:42:26,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt. [default1]:[2022-09-04 05:42:26,331] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt [default1]:[2022-09-04 05:42:26,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt. [default1]:[2022-09-04 05:42:26,344] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt [default0]:[2022-09-04 05:42:26,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt. [default0]:[2022-09-04 05:42:26,415] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt [default3]:[2022-09-04 05:42:26,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt. [default3]:[2022-09-04 05:42:26,405] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt [default1]:[2022-09-04 05:42:26,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt. [default1]:[2022-09-04 05:42:26,451] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt [default7]:[2022-09-04 05:42:26,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt. [default7]:[2022-09-04 05:42:26,444] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt [default7]:[2022-09-04 05:42:26,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt. [default7]:[2022-09-04 05:42:26,435] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt [default3]:[2022-09-04 05:42:26,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt. [default3]:[2022-09-04 05:42:26,446] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt [default5]:[2022-09-04 05:42:26,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt. [default5]:[2022-09-04 05:42:26,494] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt [default5]:[2022-09-04 05:42:26,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt. [default5]:[2022-09-04 05:42:26,521] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt [default4]:[2022-09-04 05:42:26,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt. [default4]:[2022-09-04 05:42:26,531] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt [default6]:[2022-09-04 05:42:26,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt. [default6]:[2022-09-04 05:42:26,560] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt [default6]:[2022-09-04 05:42:26,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt. [default6]:[2022-09-04 05:42:26,477] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt [default5]:[2022-09-04 05:42:26,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt. [default5]:[2022-09-04 05:42:26,612] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt [default6]:[2022-09-04 05:42:26,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt. [default6]:[2022-09-04 05:42:26,584] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt [default3]:[2022-09-04 05:42:26,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt. [default3]:[2022-09-04 05:42:26,639] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt [default3]:[2022-09-04 05:42:26,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt. [default3]:[2022-09-04 05:42:26,670] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt [default5]:[2022-09-04 05:42:26,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt. [default5]:[2022-09-04 05:42:26,698] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt [default5]:[2022-09-04 05:42:26,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt. [default5]:[2022-09-04 05:42:26,699] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt [default5]:[2022-09-04 05:42:26,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt. [default5]:[2022-09-04 05:42:26,630] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt [default6]:[2022-09-04 05:42:26,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt. [default6]:[2022-09-04 05:42:26,678] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt [default6]:[2022-09-04 05:42:26,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt. [default6]:[2022-09-04 05:42:26,749] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt [default2]:[2022-09-04 05:42:26,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt. [default2]:[2022-09-04 05:42:26,710] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt [default3]:[2022-09-04 05:42:26,759] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt. [default3]:[2022-09-04 05:42:26,759] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt [default0]:[2022-09-04 05:42:26,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt. [default0]:[2022-09-04 05:42:26,799] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt [default6]:[2022-09-04 05:42:26,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt. [default6]:[2022-09-04 05:42:26,798] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt [default5]:[2022-09-04 05:42:26,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt. [default5]:[2022-09-04 05:42:26,799] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt [default4]:[2022-09-04 05:42:26,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt. [default4]:[2022-09-04 05:42:26,829] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt [default3]:[2022-09-04 05:42:26,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt. [default3]:[2022-09-04 05:42:26,773] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt [default2]:[2022-09-04 05:42:26,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt. [default2]:[2022-09-04 05:42:26,810] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt [default7]:[2022-09-04 05:42:26,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt. [default7]:[2022-09-04 05:42:26,845] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt [default4]:[2022-09-04 05:42:26,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt. [default4]:[2022-09-04 05:42:26,802] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt [default6]:[2022-09-04 05:42:26,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt. [default6]:[2022-09-04 05:42:26,820] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt [default2]:[2022-09-04 05:42:26,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt. [default2]:[2022-09-04 05:42:26,847] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt [default7]:[2022-09-04 05:42:26,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt. [default7]:[2022-09-04 05:42:26,845] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt [default6]:[2022-09-04 05:42:26,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt. [default6]:[2022-09-04 05:42:26,871] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt [default1]:[2022-09-04 05:42:26,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt. [default1]:[2022-09-04 05:42:26,840] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt [default6]:[2022-09-04 05:42:26,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt. [default6]:[2022-09-04 05:42:26,919] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt [default3]:[2022-09-04 05:42:26,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt. [default3]:[2022-09-04 05:42:26,917] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt [default2]:[2022-09-04 05:42:26,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt. [default2]:[2022-09-04 05:42:26,930] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt [default0]:[2022-09-04 05:42:26,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt. [default0]:[2022-09-04 05:42:26,909] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt [default6]:[2022-09-04 05:42:26,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt. [default6]:[2022-09-04 05:42:26,940] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt [default2]:[2022-09-04 05:42:26,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt. [default2]:[2022-09-04 05:42:26,967] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt [default6]:[2022-09-04 05:42:27,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt. [default6]:[2022-09-04 05:42:27,011] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt [default0]:[2022-09-04 05:42:27,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt. [default0]:[2022-09-04 05:42:27,010] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt [default2]:[2022-09-04 05:42:27,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt. [default2]:[2022-09-04 05:42:27,001] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt [default3]:[2022-09-04 05:42:27,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt. [default3]:[2022-09-04 05:42:27,005] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt [default2]:[2022-09-04 05:42:27,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt. [default2]:[2022-09-04 05:42:27,046] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt [default7]:[2022-09-04 05:42:27,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt. [default7]:[2022-09-04 05:42:27,026] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt [default1]:[2022-09-04 05:42:27,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt. [default1]:[2022-09-04 05:42:27,015] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt [default3]:[2022-09-04 05:42:27,080] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt. [default3]:[2022-09-04 05:42:27,080] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt [default1]:[2022-09-04 05:42:27,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt. [default1]:[2022-09-04 05:42:27,069] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt [default0]:[2022-09-04 05:42:27,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt. [default0]:[2022-09-04 05:42:27,019] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt [default2]:[2022-09-04 05:42:27,073] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt. [default2]:[2022-09-04 05:42:27,073] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt [default5]:[2022-09-04 05:42:27,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt. [default5]:[2022-09-04 05:42:27,047] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt [default1]:[2022-09-04 05:42:27,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt. [default1]:[2022-09-04 05:42:27,102] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt [default4]:[2022-09-04 05:42:27,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt. [default4]:[2022-09-04 05:42:27,101] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt [default0]:[2022-09-04 05:42:27,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt. [default0]:[2022-09-04 05:42:27,132] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt [default0]:[2022-09-04 05:42:27,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt. [default0]:[2022-09-04 05:42:27,143] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt [default2]:[2022-09-04 05:42:27,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt. [default2]:[2022-09-04 05:42:27,109] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt [default4]:[2022-09-04 05:42:27,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt. [default4]:[2022-09-04 05:42:27,172] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt [default1]:[2022-09-04 05:42:27,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt. [default1]:[2022-09-04 05:42:27,162] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt [default7]:[2022-09-04 05:42:27,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt. [default7]:[2022-09-04 05:42:27,139] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt [default0]:[2022-09-04 05:42:27,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt. [default0]:[2022-09-04 05:42:27,163] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt [default5]:[2022-09-04 05:42:27,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt. [default5]:[2022-09-04 05:42:27,242] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt [default1]:[2022-09-04 05:42:27,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt. [default1]:[2022-09-04 05:42:27,193] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt [default3]:[2022-09-04 05:42:27,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt. [default3]:[2022-09-04 05:42:27,250] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt [default3]:[2022-09-04 05:42:27,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt. [default3]:[2022-09-04 05:42:27,251] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt [default5]:[2022-09-04 05:42:27,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt. [default5]:[2022-09-04 05:42:27,255] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt [default4]:[2022-09-04 05:42:27,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt. [default4]:[2022-09-04 05:42:27,284] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt [default3]:[2022-09-04 05:42:27,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt. [default3]:[2022-09-04 05:42:27,336] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt [default1]:[2022-09-04 05:42:27,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt. [default1]:[2022-09-04 05:42:27,288] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt [default1]:[2022-09-04 05:42:27,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt. [default1]:[2022-09-04 05:42:27,365] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt [default0]:[2022-09-04 05:42:27,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt. [default0]:[2022-09-04 05:42:27,362] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt [default7]:[2022-09-04 05:42:27,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt. [default7]:[2022-09-04 05:42:27,441] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt [default3]:[2022-09-04 05:42:27,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt. [default3]:[2022-09-04 05:42:27,450] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt [default0]:[2022-09-04 05:42:27,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt. [default0]:[2022-09-04 05:42:27,440] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt [default5]:[2022-09-04 05:42:27,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt. [default5]:[2022-09-04 05:42:27,553] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt [default6]:[2022-09-04 05:42:27,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt. [default6]:[2022-09-04 05:42:27,567] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt [default3]:[2022-09-04 05:42:27,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt. [default3]:[2022-09-04 05:42:27,490] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt [default5]:[2022-09-04 05:42:27,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt. [default5]:[2022-09-04 05:42:27,512] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt [default2]:[2022-09-04 05:42:27,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt. [default2]:[2022-09-04 05:42:27,565] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt [default4]:[2022-09-04 05:42:27,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt. [default4]:[2022-09-04 05:42:27,530] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt [default4]:[2022-09-04 05:42:27,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt. [default4]:[2022-09-04 05:42:27,510] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt [default2]:[2022-09-04 05:42:27,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt. [default2]:[2022-09-04 05:42:27,608] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt [default7]:[2022-09-04 05:42:27,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt. [default7]:[2022-09-04 05:42:27,558] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt [default0]:[2022-09-04 05:42:27,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt. [default0]:[2022-09-04 05:42:27,548] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt [default4]:[2022-09-04 05:42:27,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt. [default4]:[2022-09-04 05:42:27,644] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt [default4]:[2022-09-04 05:42:27,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt. [default4]:[2022-09-04 05:42:27,629] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt [default3]:[2022-09-04 05:42:27,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt. [default3]:[2022-09-04 05:42:27,576] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt [default4]:[2022-09-04 05:42:27,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt. [default4]:[2022-09-04 05:42:27,643] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt [default1]:[2022-09-04 05:42:27,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt. [default1]:[2022-09-04 05:42:27,591] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt [default4]:[2022-09-04 05:42:27,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt. [default4]:[2022-09-04 05:42:27,617] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt [default0]:[2022-09-04 05:42:27,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt. [default0]:[2022-09-04 05:42:27,602] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt [default0]:[2022-09-04 05:42:27,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt. [default0]:[2022-09-04 05:42:27,603] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt [default4]:[2022-09-04 05:42:27,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt. [default4]:[2022-09-04 05:42:27,627] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt [default3]:[2022-09-04 05:42:27,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt. [default3]:[2022-09-04 05:42:27,688] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt [default3]:[2022-09-04 05:42:27,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt. [default3]:[2022-09-04 05:42:27,710] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt [default4]:[2022-09-04 05:42:27,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt. [default4]:[2022-09-04 05:42:27,692] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt [default3]:[2022-09-04 05:42:27,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt. [default3]:[2022-09-04 05:42:27,694] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt [default7]:[2022-09-04 05:42:27,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt. [default7]:[2022-09-04 05:42:27,669] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt [default5]:[2022-09-04 05:42:27,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt. [default5]:[2022-09-04 05:42:27,656] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt [default0]:[2022-09-04 05:42:27,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt. [default0]:[2022-09-04 05:42:27,735] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt [default7]:[2022-09-04 05:42:27,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt. [default7]:[2022-09-04 05:42:27,696] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt [default2]:[2022-09-04 05:42:27,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt. [default2]:[2022-09-04 05:42:27,718] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt [default0]:[2022-09-04 05:42:27,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt. [default0]:[2022-09-04 05:42:27,705] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt [default6]:[2022-09-04 05:42:27,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt. [default6]:[2022-09-04 05:42:27,768] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt [default1]:[2022-09-04 05:42:27,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt. [default1]:[2022-09-04 05:42:27,717] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt [default5]:[2022-09-04 05:42:27,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt. [default5]:[2022-09-04 05:42:27,769] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt [default4]:[2022-09-04 05:42:27,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt. [default4]:[2022-09-04 05:42:27,706] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt [default1]:[2022-09-04 05:42:27,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt. [default1]:[2022-09-04 05:42:27,698] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt [default1]:[2022-09-04 05:42:27,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt. [default1]:[2022-09-04 05:42:27,798] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt [default3]:[2022-09-04 05:42:27,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt. [default3]:[2022-09-04 05:42:27,714] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt [default1]:[2022-09-04 05:42:27,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt. [default1]:[2022-09-04 05:42:27,712] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt [default2]:[2022-09-04 05:42:27,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt. [default2]:[2022-09-04 05:42:27,764] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt [default2]:[2022-09-04 05:42:27,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt. [default2]:[2022-09-04 05:42:27,823] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt [default2]:[2022-09-04 05:42:27,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt. [default2]:[2022-09-04 05:42:27,808] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt [default1]:[2022-09-04 05:42:27,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt. [default1]:[2022-09-04 05:42:27,870] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt [default6]:[2022-09-04 05:42:27,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt. [default6]:[2022-09-04 05:42:27,891] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt [default7]:[2022-09-04 05:42:27,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt. [default7]:[2022-09-04 05:42:27,935] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt [default0]:[2022-09-04 05:42:27,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt. [default0]:[2022-09-04 05:42:27,892] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt [default0]:[2022-09-04 05:42:27,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt. [default0]:[2022-09-04 05:42:27,899] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt [default4]:[2022-09-04 05:42:27,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt. [default4]:[2022-09-04 05:42:27,938] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt [default7]:[2022-09-04 05:42:27,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt. [default7]:[2022-09-04 05:42:27,964] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt [default2]:[2022-09-04 05:42:27,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt. [default2]:[2022-09-04 05:42:27,979] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt [default5]:[2022-09-04 05:42:27,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt. [default5]:[2022-09-04 05:42:27,997] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt [default7]:[2022-09-04 05:42:28,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt. [default7]:[2022-09-04 05:42:28,014] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt [default2]:[2022-09-04 05:42:28,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt. [default2]:[2022-09-04 05:42:28,025] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt [default2]:[2022-09-04 05:42:28,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt. [default2]:[2022-09-04 05:42:28,025] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt [default1]:[2022-09-04 05:42:28,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt. [default1]:[2022-09-04 05:42:28,056] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt [default4]:[2022-09-04 05:42:28,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt. [default4]:[2022-09-04 05:42:28,068] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt [default0]:[2022-09-04 05:42:28,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt. [default0]:[2022-09-04 05:42:28,097] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt [default2]:[2022-09-04 05:42:28,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt. [default2]:[2022-09-04 05:42:28,174] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt [default5]:[2022-09-04 05:42:28,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt. [default5]:[2022-09-04 05:42:28,158] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt [default6]:[2022-09-04 05:42:28,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt. [default6]:[2022-09-04 05:42:28,139] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt [default5]:[2022-09-04 05:42:28,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt. [default5]:[2022-09-04 05:42:28,244] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt [default5]:[2022-09-04 05:42:28,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt. [default5]:[2022-09-04 05:42:28,241] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt [default2]:[2022-09-04 05:42:28,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt. [default2]:[2022-09-04 05:42:28,226] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt [default6]:[2022-09-04 05:42:28,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt. [default6]:[2022-09-04 05:42:28,263] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt [default7]:[2022-09-04 05:42:28,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt. [default7]:[2022-09-04 05:42:28,330] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt [default7]:[2022-09-04 05:42:28,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt. [default7]:[2022-09-04 05:42:28,342] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt [default1]:[2022-09-04 05:42:28,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt. [default1]:[2022-09-04 05:42:28,326] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt [default5]:[2022-09-04 05:42:28,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt. [default5]:[2022-09-04 05:42:28,409] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt [default6]:[2022-09-04 05:42:28,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt. [default6]:[2022-09-04 05:42:28,398] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt [default7]:[2022-09-04 05:42:28,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt. [default7]:[2022-09-04 05:42:28,489] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt [default0]:[2022-09-04 05:42:28,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt. [default0]:[2022-09-04 05:42:28,443] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt [default1]:[2022-09-04 05:42:28,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt. [default1]:[2022-09-04 05:42:28,549] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt [default7]:[2022-09-04 05:42:28,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt. [default7]:[2022-09-04 05:42:28,533] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt [default4]:[2022-09-04 05:42:28,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt. [default4]:[2022-09-04 05:42:28,578] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt [default6]:[2022-09-04 05:42:28,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt. [default6]:[2022-09-04 05:42:28,628] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt [default0]:[2022-09-04 05:42:28,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt. [default0]:[2022-09-04 05:42:28,662] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt [default5]:[2022-09-04 05:42:28,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt. [default5]:[2022-09-04 05:42:28,768] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt [default1]:[2022-09-04 05:42:28,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt. [default1]:[2022-09-04 05:42:28,731] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt [default7]:[2022-09-04 05:42:28,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt. [default7]:[2022-09-04 05:42:28,748] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt [default6]:[2022-09-04 05:42:28,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt. [default6]:[2022-09-04 05:42:28,837] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt [default4]:[2022-09-04 05:42:28,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt. [default4]:[2022-09-04 05:42:28,834] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt [default6]:[2022-09-04 05:42:28,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt. [default6]:[2022-09-04 05:42:28,930] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt [default1]:[2022-09-04 05:42:29,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt. [default1]:[2022-09-04 05:42:29,008] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt [default7]:[2022-09-04 05:42:29,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt. [default7]:[2022-09-04 05:42:29,506] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt [default0]:[2022-09-04 05:42:30,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt. [default0]:[2022-09-04 05:42:30,050] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt [default4]:[2022-09-04 05:42:30,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt. [default4]:[2022-09-04 05:42:30,360] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt [default2]:[2022-09-04 05:42:30,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt. [default2]:[2022-09-04 05:42:30,695] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt [default5]:[2022-09-04 05:42:30,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt. [default5]:[2022-09-04 05:42:30,773] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt [default6]:[2022-09-04 05:42:30,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt. [default6]:[2022-09-04 05:42:30,979] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt [default7]:[2022-09-04 05:42:30,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt. [default7]:[2022-09-04 05:42:30,912] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt [default5]:[2022-09-04 05:42:31,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt. [default5]:[2022-09-04 05:42:31,358] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt [default3]:[2022-09-04 05:42:31,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt. [default3]:[2022-09-04 05:42:31,456] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt [default4]:[2022-09-04 05:42:31,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt. [default4]:[2022-09-04 05:42:31,379] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt [default3]:[2022-09-04 05:42:31,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt. [default3]:[2022-09-04 05:42:31,490] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt [default5]:[2022-09-04 05:42:31,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt. [default5]:[2022-09-04 05:42:31,441] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt [default4]:[2022-09-04 05:42:31,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt. [default4]:[2022-09-04 05:42:31,456] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt [default7]:[2022-09-04 05:42:31,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt. [default7]:[2022-09-04 05:42:31,502] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt [default4]:[2022-09-04 05:42:31,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt. [default4]:[2022-09-04 05:42:31,492] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt [default3]:[2022-09-04 05:42:31,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt. [default3]:[2022-09-04 05:42:31,493] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt [default5]:[2022-09-04 05:42:31,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt. [default5]:[2022-09-04 05:42:31,568] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt [default3]:[2022-09-04 05:42:31,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt. [default3]:[2022-09-04 05:42:31,719] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt [default6]:[2022-09-04 05:42:31,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt. [default6]:[2022-09-04 05:42:31,749] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt [default5]:[2022-09-04 05:42:31,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt. [default5]:[2022-09-04 05:42:31,843] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt [default4]:[2022-09-04 05:42:32,064] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt. [default4]:[2022-09-04 05:42:32,064] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt [default0]:[2022-09-04 05:42:32,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt. [default0]:[2022-09-04 05:42:32,101] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt [default0]:[2022-09-04 05:42:32,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt. [default0]:[2022-09-04 05:42:32,118] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt [default6]:[2022-09-04 05:42:32,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt. [default6]:[2022-09-04 05:42:32,108] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt [default1]:[2022-09-04 05:42:32,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt. [default1]:[2022-09-04 05:42:32,346] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt [default6]:[2022-09-04 05:42:32,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt. [default6]:[2022-09-04 05:42:32,333] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt [default4]:[2022-09-04 05:42:32,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt. [default4]:[2022-09-04 05:42:32,384] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt [default0]:[2022-09-04 05:42:32,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt. [default0]:[2022-09-04 05:42:32,424] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt [default2]:[2022-09-04 05:42:32,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt. [default2]:[2022-09-04 05:42:32,526] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt [default6]:[2022-09-04 05:42:32,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt. [default6]:[2022-09-04 05:42:32,604] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt [default1]:[2022-09-04 05:42:32,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt. [default1]:[2022-09-04 05:42:32,554] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt [default7]:[2022-09-04 05:42:32,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt. [default7]:[2022-09-04 05:42:32,627] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt [default6]:[2022-09-04 05:42:32,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt. [default6]:[2022-09-04 05:42:32,632] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt [default3]:[2022-09-04 05:42:32,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt. [default3]:[2022-09-04 05:42:32,678] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt [default2]:[2022-09-04 05:42:32,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt. [default2]:[2022-09-04 05:42:32,689] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt [default2]:[2022-09-04 05:42:32,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt. [default2]:[2022-09-04 05:42:32,744] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt [default2]:[2022-09-04 05:42:32,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt. [default2]:[2022-09-04 05:42:32,773] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt [default0]:[2022-09-04 05:42:32,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt. [default0]:[2022-09-04 05:42:32,818] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt [default3]:[2022-09-04 05:42:32,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt. [default3]:[2022-09-04 05:42:32,781] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt [default4]:[2022-09-04 05:42:33,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt. [default4]:[2022-09-04 05:42:33,142] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt [default1]:[2022-09-04 05:42:33,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt. [default1]:[2022-09-04 05:42:33,222] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt [default0]:[2022-09-04 05:42:33,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt. [default0]:[2022-09-04 05:42:33,187] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt [default4]:[2022-09-04 05:42:33,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt. [default4]:[2022-09-04 05:42:33,403] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt [default1]:[2022-09-04 05:42:33,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt. [default1]:[2022-09-04 05:42:33,560] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt [default5]:[2022-09-04 05:42:33,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt. [default5]:[2022-09-04 05:42:33,842] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt [default3]:[2022-09-04 05:42:33,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt. [default3]:[2022-09-04 05:42:33,787] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt [default7]:[2022-09-04 05:42:33,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt. [default7]:[2022-09-04 05:42:33,871] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt [default5]:[2022-09-04 05:42:33,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt. [default5]:[2022-09-04 05:42:33,866] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt [default7]:[2022-09-04 05:42:34,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt. [default7]:[2022-09-04 05:42:34,026] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt [default6]:[2022-09-04 05:42:34,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt. [default6]:[2022-09-04 05:42:34,058] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt [default6]:[2022-09-04 05:42:34,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt. [default6]:[2022-09-04 05:42:34,153] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt [default4]:[2022-09-04 05:42:34,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt. [default4]:[2022-09-04 05:42:34,124] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt [default2]:[2022-09-04 05:42:34,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt. [default2]:[2022-09-04 05:42:34,421] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt [default2]:[2022-09-04 05:42:34,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt. [default2]:[2022-09-04 05:42:34,492] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt [default5]:[2022-09-04 05:42:34,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt. [default5]:[2022-09-04 05:42:34,573] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt [default3]:[2022-09-04 05:42:34,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt. [default3]:[2022-09-04 05:42:34,791] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt [default3]:[2022-09-04 05:42:34,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt. [default3]:[2022-09-04 05:42:34,963] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt [default5]:[2022-09-04 05:42:35,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt. [default5]:[2022-09-04 05:42:35,101] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt [default4]:[2022-09-04 05:42:35,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt. [default4]:[2022-09-04 05:42:35,100] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt [default7]:[2022-09-04 05:42:35,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt. [default7]:[2022-09-04 05:42:35,560] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt [default6]:[2022-09-04 05:42:35,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt. [default6]:[2022-09-04 05:42:35,635] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt [default5]:[2022-09-04 05:42:35,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt. [default5]:[2022-09-04 05:42:35,716] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt [default1]:[2022-09-04 05:42:35,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt. [default1]:[2022-09-04 05:42:35,918] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt [default0]:[2022-09-04 05:42:35,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt. [default0]:[2022-09-04 05:42:35,963] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt [default7]:[2022-09-04 05:42:36,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt. [default7]:[2022-09-04 05:42:36,227] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt [default4]:[2022-09-04 05:42:36,347] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt. [default4]:[2022-09-04 05:42:36,347] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt [default0]:[2022-09-04 05:42:36,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt. [default0]:[2022-09-04 05:42:36,407] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt [default7]:[2022-09-04 05:42:36,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt. [default7]:[2022-09-04 05:42:36,425] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt [default6]:[2022-09-04 05:42:36,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt. [default6]:[2022-09-04 05:42:36,520] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt [default2]:[2022-09-04 05:42:36,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt. [default2]:[2022-09-04 05:42:36,522] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt [default3]:[2022-09-04 05:42:36,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt. [default3]:[2022-09-04 05:42:36,515] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt [default2]:[2022-09-04 05:42:36,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt. [default2]:[2022-09-04 05:42:36,632] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt [default6]:[2022-09-04 05:42:36,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt. [default6]:[2022-09-04 05:42:36,563] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt [default1]:[2022-09-04 05:42:36,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt. [default1]:[2022-09-04 05:42:36,580] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt [default7]:[2022-09-04 05:42:36,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt. [default7]:[2022-09-04 05:42:36,739] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt [default4]:[2022-09-04 05:42:37,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt. [default4]:[2022-09-04 05:42:37,002] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt [default1]:[2022-09-04 05:42:37,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt. [default1]:[2022-09-04 05:42:37,177] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt [default4]:[2022-09-04 05:42:37,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt. [default4]:[2022-09-04 05:42:37,228] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt [default5]:[2022-09-04 05:42:37,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt. [default5]:[2022-09-04 05:42:37,306] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt [default3]:[2022-09-04 05:42:37,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt. [default3]:[2022-09-04 05:42:37,303] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt [default0]:[2022-09-04 05:42:37,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt. [default0]:[2022-09-04 05:42:37,297] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt [default6]:[2022-09-04 05:42:37,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt. [default6]:[2022-09-04 05:42:37,954] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt [default7]:[2022-09-04 05:42:37,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt. [default7]:[2022-09-04 05:42:37,985] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt [default2]:[2022-09-04 05:42:38,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. [default2]:[2022-09-04 05:42:38,051] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt [default3]:[2022-09-04 05:42:38,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. [default3]:[2022-09-04 05:42:38,186] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt [default1]:[2022-09-04 05:42:38,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt. [default1]:[2022-09-04 05:42:38,329] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt [default1]:[2022-09-04 05:42:38,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. [default1]:[2022-09-04 05:42:38,879] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt [default0]:[2022-09-04 05:42:39,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt. [default0]:[2022-09-04 05:42:39,321] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt [default0]:[2022-09-04 05:42:39,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [default0]:[2022-09-04 05:42:39,542] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt. [default4]:[2022-09-04 05:42:41,535] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]: successfully saved checkpoint at iteration 249 to /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:time (ms) | save-checkpoint: 28608.21 [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt. [default5]:[2022-09-04 05:42:41,518] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step249/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default0]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default3]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default5]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default4]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default6]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default1]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default2]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]:[2022-09-04 05:42:41,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step249 is ready now! [default7]: iteration 250/ 3100 | consumed samples: 512000 | consumed tokens: 1048576000 | elapsed time per iteration (s): 170.76 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.255322E-01 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 11.994 | TFLOPs: 122.44 | [default7]:---------------------------------------------------------------------------------------------------------- [default7]:validation_pretraining loss at iteration 250 | lm loss value: 2.384036E+00 | lm loss PPL: 1.084860E+01 | [default7]:---------------------------------------------------------------------------------------------------------- [default7]:----------------------------------------------------------------------------------------- [default7]:valid loss at iteration 250 | lm loss value: 1.255310E+00 | lm loss PPL: 3.508926E+00 | [default7]:----------------------------------------------------------------------------------------- [default7]: iteration 251/ 3100 | consumed samples: 514048 | consumed tokens: 1052770304 | elapsed time per iteration (s): 224.85 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.232516E-01 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 9.108 | TFLOPs: 92.98 | [default7]: iteration 252/ 3100 | consumed samples: 516096 | consumed tokens: 1056964608 | elapsed time per iteration (s): 141.09 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.148634E-01 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.515 | TFLOPs: 148.18 | [default7]: iteration 253/ 3100 | consumed samples: 518144 | consumed tokens: 1061158912 | elapsed time per iteration (s): 142.95 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.269530E-01 | grad norm: 0.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.326 | TFLOPs: 146.25 | [default7]: iteration 254/ 3100 | consumed samples: 520192 | consumed tokens: 1065353216 | elapsed time per iteration (s): 141.71 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.251751E-01 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.452 | TFLOPs: 147.53 | [default7]: iteration 255/ 3100 | consumed samples: 522240 | consumed tokens: 1069547520 | elapsed time per iteration (s): 141.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.204057E-01 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.501 | TFLOPs: 148.03 | [default7]: iteration 256/ 3100 | consumed samples: 524288 | consumed tokens: 1073741824 | elapsed time per iteration (s): 142.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.291830E-01 | grad norm: 0.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.384 | TFLOPs: 146.84 | [default7]: iteration 257/ 3100 | consumed samples: 526336 | consumed tokens: 1077936128 | elapsed time per iteration (s): 141.09 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.281418E-01 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.516 | TFLOPs: 148.18 | [default7]: iteration 258/ 3100 | consumed samples: 528384 | consumed tokens: 1082130432 | elapsed time per iteration (s): 142.91 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.166064E-01 | grad norm: 0.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.331 | TFLOPs: 146.30 | [default7]: iteration 259/ 3100 | consumed samples: 530432 | consumed tokens: 1086324736 | elapsed time per iteration (s): 142.18 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.333514E-01 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.404 | TFLOPs: 147.04 | [default7]: iteration 260/ 3100 | consumed samples: 532480 | consumed tokens: 1090519040 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.248009E-01 | grad norm: 0.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 261/ 3100 | consumed samples: 534528 | consumed tokens: 1094713344 | elapsed time per iteration (s): 142.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.276165E-01 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.371 | TFLOPs: 146.71 | [default7]: iteration 262/ 3100 | consumed samples: 536576 | consumed tokens: 1098907648 | elapsed time per iteration (s): 142.93 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.178865E-01 | grad norm: 0.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.329 | TFLOPs: 146.28 | [default7]: iteration 263/ 3100 | consumed samples: 538624 | consumed tokens: 1103101952 | elapsed time per iteration (s): 142.83 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.199655E-01 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.339 | TFLOPs: 146.38 | [default7]: iteration 264/ 3100 | consumed samples: 540672 | consumed tokens: 1107296256 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.123626E-01 | grad norm: 0.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.484 | TFLOPs: 147.86 | [default7]: iteration 265/ 3100 | consumed samples: 542720 | consumed tokens: 1111490560 | elapsed time per iteration (s): 142.91 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.277120E-01 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.331 | TFLOPs: 146.29 | [default7]: iteration 266/ 3100 | consumed samples: 544768 | consumed tokens: 1115684864 | elapsed time per iteration (s): 142.52 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.267170E-01 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.370 | TFLOPs: 146.69 | [default7]: iteration 267/ 3100 | consumed samples: 546816 | consumed tokens: 1119879168 | elapsed time per iteration (s): 141.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.181975E-01 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.486 | TFLOPs: 147.88 | [default7]: iteration 268/ 3100 | consumed samples: 548864 | consumed tokens: 1124073472 | elapsed time per iteration (s): 141.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.152599E-01 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.492 | TFLOPs: 147.94 | [default7]: iteration 269/ 3100 | consumed samples: 550912 | consumed tokens: 1128267776 | elapsed time per iteration (s): 142.96 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.128665E-01 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.325 | TFLOPs: 146.24 | [default7]: iteration 270/ 3100 | consumed samples: 552960 | consumed tokens: 1132462080 | elapsed time per iteration (s): 142.86 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.228063E-01 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.335 | TFLOPs: 146.34 | [default7]: iteration 271/ 3100 | consumed samples: 555008 | consumed tokens: 1136656384 | elapsed time per iteration (s): 143.00 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.189485E-01 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.322 | TFLOPs: 146.20 | [default7]: iteration 272/ 3100 | consumed samples: 557056 | consumed tokens: 1140850688 | elapsed time per iteration (s): 141.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.170005E-01 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.489 | TFLOPs: 147.91 | [default7]: iteration 273/ 3100 | consumed samples: 559104 | consumed tokens: 1145044992 | elapsed time per iteration (s): 142.82 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.263371E-01 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.340 | TFLOPs: 146.39 | [default7]: iteration 274/ 3100 | consumed samples: 561152 | consumed tokens: 1149239296 | elapsed time per iteration (s): 143.08 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.186748E-01 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.313 | TFLOPs: 146.12 | [default7]: iteration 275/ 3100 | consumed samples: 563200 | consumed tokens: 1153433600 | elapsed time per iteration (s): 140.95 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.120414E-01 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.530 | TFLOPs: 148.33 | [default7]: iteration 276/ 3100 | consumed samples: 565248 | consumed tokens: 1157627904 | elapsed time per iteration (s): 142.77 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.183582E-01 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.345 | TFLOPs: 146.44 | [default7]: iteration 277/ 3100 | consumed samples: 567296 | consumed tokens: 1161822208 | elapsed time per iteration (s): 141.06 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.231562E-01 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.519 | TFLOPs: 148.21 | [default7]: iteration 278/ 3100 | consumed samples: 569344 | consumed tokens: 1166016512 | elapsed time per iteration (s): 142.78 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.319224E-01 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.344 | TFLOPs: 146.43 | [default7]: iteration 279/ 3100 | consumed samples: 571392 | consumed tokens: 1170210816 | elapsed time per iteration (s): 143.08 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.198663E-01 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.314 | TFLOPs: 146.12 | [default7]: iteration 280/ 3100 | consumed samples: 573440 | consumed tokens: 1174405120 | elapsed time per iteration (s): 143.01 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.094195E-01 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.320 | TFLOPs: 146.19 | [default7]: iteration 281/ 3100 | consumed samples: 575488 | consumed tokens: 1178599424 | elapsed time per iteration (s): 141.76 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.203704E-01 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.447 | TFLOPs: 147.48 | [default7]: iteration 282/ 3100 | consumed samples: 577536 | consumed tokens: 1182793728 | elapsed time per iteration (s): 142.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.195836E-01 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.383 | TFLOPs: 146.83 | [default7]: iteration 283/ 3100 | consumed samples: 579584 | consumed tokens: 1186988032 | elapsed time per iteration (s): 141.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.142539E-01 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.479 | TFLOPs: 147.80 | [default7]: iteration 284/ 3100 | consumed samples: 581632 | consumed tokens: 1191182336 | elapsed time per iteration (s): 141.36 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.043152E-01 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.488 | TFLOPs: 147.90 | [default7]: iteration 285/ 3100 | consumed samples: 583680 | consumed tokens: 1195376640 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.097698E-01 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.82 | [default7]: iteration 286/ 3100 | consumed samples: 585728 | consumed tokens: 1199570944 | elapsed time per iteration (s): 141.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.125902E-01 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.435 | TFLOPs: 147.36 | [default7]: iteration 287/ 3100 | consumed samples: 587776 | consumed tokens: 1203765248 | elapsed time per iteration (s): 142.82 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.110820E-01 | grad norm: 0.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.339 | TFLOPs: 146.38 | [default7]: iteration 288/ 3100 | consumed samples: 589824 | consumed tokens: 1207959552 | elapsed time per iteration (s): 141.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.186965E-01 | grad norm: 0.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.497 | TFLOPs: 147.99 | [default7]: iteration 289/ 3100 | consumed samples: 591872 | consumed tokens: 1212153856 | elapsed time per iteration (s): 140.16 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.285992E-01 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.612 | TFLOPs: 149.16 | [default7]: iteration 290/ 3100 | consumed samples: 593920 | consumed tokens: 1216348160 | elapsed time per iteration (s): 142.89 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.057842E-01 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.333 | TFLOPs: 146.32 | [default7]: iteration 291/ 3100 | consumed samples: 595968 | consumed tokens: 1220542464 | elapsed time per iteration (s): 143.00 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.162170E-01 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.322 | TFLOPs: 146.21 | [default7]: iteration 292/ 3100 | consumed samples: 598016 | consumed tokens: 1224736768 | elapsed time per iteration (s): 141.77 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.116029E-01 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.446 | TFLOPs: 147.47 | [default7]: iteration 293/ 3100 | consumed samples: 600064 | consumed tokens: 1228931072 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.144647E-01 | grad norm: 0.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.485 | TFLOPs: 147.87 | [default7]: iteration 294/ 3100 | consumed samples: 602112 | consumed tokens: 1233125376 | elapsed time per iteration (s): 141.49 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.103790E-01 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.475 | TFLOPs: 147.77 | [default7]: iteration 295/ 3100 | consumed samples: 604160 | consumed tokens: 1237319680 | elapsed time per iteration (s): 142.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.154240E-01 | grad norm: 0.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.384 | TFLOPs: 146.84 | [default7]: iteration 296/ 3100 | consumed samples: 606208 | consumed tokens: 1241513984 | elapsed time per iteration (s): 141.34 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.113785E-01 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.490 | TFLOPs: 147.92 | [default7]: iteration 297/ 3100 | consumed samples: 608256 | consumed tokens: 1245708288 | elapsed time per iteration (s): 142.00 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.138956E-01 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.422 | TFLOPs: 147.23 | [default7]: iteration 298/ 3100 | consumed samples: 610304 | consumed tokens: 1249902592 | elapsed time per iteration (s): 142.29 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.063470E-01 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.393 | TFLOPs: 146.93 | [default7]: iteration 299/ 3100 | consumed samples: 612352 | consumed tokens: 1254096896 | elapsed time per iteration (s): 141.64 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.103228E-01 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.460 | TFLOPs: 147.61 | [default7]: iteration 300/ 3100 | consumed samples: 614400 | consumed tokens: 1258291200 | elapsed time per iteration (s): 142.11 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.106885E-01 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.412 | TFLOPs: 147.12 | [default7]: iteration 301/ 3100 | consumed samples: 616448 | consumed tokens: 1262485504 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.125404E-01 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.01 | [default7]: iteration 302/ 3100 | consumed samples: 618496 | consumed tokens: 1266679808 | elapsed time per iteration (s): 141.81 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.195854E-01 | grad norm: 0.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.442 | TFLOPs: 147.43 | [default7]: iteration 303/ 3100 | consumed samples: 620544 | consumed tokens: 1270874112 | elapsed time per iteration (s): 142.47 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.382646E-01 | grad norm: 2.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.375 | TFLOPs: 146.75 | [default7]: iteration 304/ 3100 | consumed samples: 622592 | consumed tokens: 1275068416 | elapsed time per iteration (s): 141.09 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.017236E-01 | grad norm: 0.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.516 | TFLOPs: 148.18 | [default7]: iteration 305/ 3100 | consumed samples: 624640 | consumed tokens: 1279262720 | elapsed time per iteration (s): 141.31 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.198709E-01 | grad norm: 0.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.493 | TFLOPs: 147.95 | [default7]: iteration 306/ 3100 | consumed samples: 626688 | consumed tokens: 1283457024 | elapsed time per iteration (s): 142.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.000538E-01 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.387 | TFLOPs: 146.87 | [default7]: iteration 307/ 3100 | consumed samples: 628736 | consumed tokens: 1287651328 | elapsed time per iteration (s): 141.58 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.036929E-01 | grad norm: 0.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.466 | TFLOPs: 147.67 | [default7]: iteration 308/ 3100 | consumed samples: 630784 | consumed tokens: 1291845632 | elapsed time per iteration (s): 141.18 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.077256E-01 | grad norm: 0.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.506 | TFLOPs: 148.08 | [default7]: iteration 309/ 3100 | consumed samples: 632832 | consumed tokens: 1296039936 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.077795E-01 | grad norm: 0.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.481 | TFLOPs: 147.82 | [default7]: iteration 310/ 3100 | consumed samples: 634880 | consumed tokens: 1300234240 | elapsed time per iteration (s): 141.22 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.053276E-01 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.503 | TFLOPs: 148.05 | [default7]: iteration 311/ 3100 | consumed samples: 636928 | consumed tokens: 1304428544 | elapsed time per iteration (s): 141.11 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.023997E-01 | grad norm: 0.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.513 | TFLOPs: 148.16 | [default7]: iteration 312/ 3100 | consumed samples: 638976 | consumed tokens: 1308622848 | elapsed time per iteration (s): 141.04 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.182233E-01 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.521 | TFLOPs: 148.23 | [default7]: iteration 313/ 3100 | consumed samples: 641024 | consumed tokens: 1312817152 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.021869E-01 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.02 | [default7]: iteration 314/ 3100 | consumed samples: 643072 | consumed tokens: 1317011456 | elapsed time per iteration (s): 141.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.050720E-01 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 315/ 3100 | consumed samples: 645120 | consumed tokens: 1321205760 | elapsed time per iteration (s): 142.00 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.059147E-01 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.422 | TFLOPs: 147.23 | [default7]: iteration 316/ 3100 | consumed samples: 647168 | consumed tokens: 1325400064 | elapsed time per iteration (s): 141.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.092455E-01 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.489 | TFLOPs: 147.91 | [default7]: iteration 317/ 3100 | consumed samples: 649216 | consumed tokens: 1329594368 | elapsed time per iteration (s): 141.22 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.105572E-01 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.502 | TFLOPs: 148.04 | [default7]: iteration 318/ 3100 | consumed samples: 651264 | consumed tokens: 1333788672 | elapsed time per iteration (s): 141.74 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.119920E-01 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.449 | TFLOPs: 147.50 | [default7]: iteration 319/ 3100 | consumed samples: 653312 | consumed tokens: 1337982976 | elapsed time per iteration (s): 141.14 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.991869E-01 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.510 | TFLOPs: 148.13 | [default7]: iteration 320/ 3100 | consumed samples: 655360 | consumed tokens: 1342177280 | elapsed time per iteration (s): 141.85 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.070528E-01 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.438 | TFLOPs: 147.39 | [default7]: iteration 321/ 3100 | consumed samples: 657408 | consumed tokens: 1346371584 | elapsed time per iteration (s): 141.69 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.070001E-01 | grad norm: 0.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.454 | TFLOPs: 147.56 | [default7]: iteration 322/ 3100 | consumed samples: 659456 | consumed tokens: 1350565888 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.960377E-01 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.98 | [default7]: iteration 323/ 3100 | consumed samples: 661504 | consumed tokens: 1354760192 | elapsed time per iteration (s): 141.92 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.116372E-01 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.430 | TFLOPs: 147.31 | [default7]: iteration 324/ 3100 | consumed samples: 663552 | consumed tokens: 1358954496 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.995338E-01 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.99 | [default7]: iteration 325/ 3100 | consumed samples: 665600 | consumed tokens: 1363148800 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.116860E-01 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 326/ 3100 | consumed samples: 667648 | consumed tokens: 1367343104 | elapsed time per iteration (s): 142.05 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.018552E-01 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.417 | TFLOPs: 147.18 | [default7]: iteration 327/ 3100 | consumed samples: 669696 | consumed tokens: 1371537408 | elapsed time per iteration (s): 141.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.014636E-01 | grad norm: 0.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.484 | TFLOPs: 147.86 | [default7]: iteration 328/ 3100 | consumed samples: 671744 | consumed tokens: 1375731712 | elapsed time per iteration (s): 142.04 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.046165E-01 | grad norm: 1.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.418 | TFLOPs: 147.19 | [default7]: iteration 329/ 3100 | consumed samples: 673792 | consumed tokens: 1379926016 | elapsed time per iteration (s): 141.91 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.977404E-01 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.431 | TFLOPs: 147.32 | [default7]: iteration 330/ 3100 | consumed samples: 675840 | consumed tokens: 1384120320 | elapsed time per iteration (s): 142.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.081459E-01 | grad norm: 0.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.390 | TFLOPs: 146.90 | [default7]: iteration 331/ 3100 | consumed samples: 677888 | consumed tokens: 1388314624 | elapsed time per iteration (s): 141.55 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.042158E-01 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.468 | TFLOPs: 147.70 | [default7]: iteration 332/ 3100 | consumed samples: 679936 | consumed tokens: 1392508928 | elapsed time per iteration (s): 141.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.028491E-01 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.479 | TFLOPs: 147.81 | [default7]: iteration 333/ 3100 | consumed samples: 681984 | consumed tokens: 1396703232 | elapsed time per iteration (s): 141.13 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.004704E-01 | grad norm: 0.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.511 | TFLOPs: 148.14 | [default7]: iteration 334/ 3100 | consumed samples: 684032 | consumed tokens: 1400897536 | elapsed time per iteration (s): 141.12 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.993373E-01 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.512 | TFLOPs: 148.15 | [default7]: iteration 335/ 3100 | consumed samples: 686080 | consumed tokens: 1405091840 | elapsed time per iteration (s): 141.79 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.053851E-01 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.444 | TFLOPs: 147.45 | [default7]: iteration 336/ 3100 | consumed samples: 688128 | consumed tokens: 1409286144 | elapsed time per iteration (s): 142.80 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.052563E-01 | grad norm: 0.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.342 | TFLOPs: 146.41 | [default7]: iteration 337/ 3100 | consumed samples: 690176 | consumed tokens: 1413480448 | elapsed time per iteration (s): 141.55 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.034887E-01 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.468 | TFLOPs: 147.70 | [default7]: iteration 338/ 3100 | consumed samples: 692224 | consumed tokens: 1417674752 | elapsed time per iteration (s): 141.50 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.011221E-01 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.474 | TFLOPs: 147.75 | [default7]: iteration 339/ 3100 | consumed samples: 694272 | consumed tokens: 1421869056 | elapsed time per iteration (s): 141.74 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.970524E-01 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.449 | TFLOPs: 147.50 | [default7]: iteration 340/ 3100 | consumed samples: 696320 | consumed tokens: 1426063360 | elapsed time per iteration (s): 141.31 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.975540E-01 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.493 | TFLOPs: 147.95 | [default7]: iteration 341/ 3100 | consumed samples: 698368 | consumed tokens: 1430257664 | elapsed time per iteration (s): 141.30 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.036514E-01 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.494 | TFLOPs: 147.96 | [default7]: iteration 342/ 3100 | consumed samples: 700416 | consumed tokens: 1434451968 | elapsed time per iteration (s): 141.68 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.041622E-01 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.455 | TFLOPs: 147.56 | [default7]: iteration 343/ 3100 | consumed samples: 702464 | consumed tokens: 1438646272 | elapsed time per iteration (s): 143.00 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.954121E-01 | grad norm: 1.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.321 | TFLOPs: 146.20 | [default7]: iteration 344/ 3100 | consumed samples: 704512 | consumed tokens: 1442840576 | elapsed time per iteration (s): 142.08 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.001410E-01 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.414 | TFLOPs: 147.15 | [default7]: iteration 345/ 3100 | consumed samples: 706560 | consumed tokens: 1447034880 | elapsed time per iteration (s): 142.44 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.958237E-01 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.378 | TFLOPs: 146.77 | [default7]: iteration 346/ 3100 | consumed samples: 708608 | consumed tokens: 1451229184 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 7.016610E-01 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 347/ 3100 | consumed samples: 710656 | consumed tokens: 1455423488 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.934452E-01 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.481 | TFLOPs: 147.83 | [default7]: iteration 348/ 3100 | consumed samples: 712704 | consumed tokens: 1459617792 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.981832E-01 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 349/ 3100 | consumed samples: 714752 | consumed tokens: 1463812096 | elapsed time per iteration (s): 141.34 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.904901E-01 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.490 | TFLOPs: 147.92 | [default7]: iteration 350/ 3100 | consumed samples: 716800 | consumed tokens: 1468006400 | elapsed time per iteration (s): 142.15 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.878313E-01 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.408 | TFLOPs: 147.08 | [default7]: iteration 351/ 3100 | consumed samples: 718848 | consumed tokens: 1472200704 | elapsed time per iteration (s): 142.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.952701E-01 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.371 | TFLOPs: 146.70 | [default7]: iteration 352/ 3100 | consumed samples: 720896 | consumed tokens: 1476395008 | elapsed time per iteration (s): 141.75 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.928116E-01 | grad norm: 0.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.448 | TFLOPs: 147.49 | [default7]: iteration 353/ 3100 | consumed samples: 722944 | consumed tokens: 1480589312 | elapsed time per iteration (s): 141.97 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.917469E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.426 | TFLOPs: 147.27 | [default7]: iteration 354/ 3100 | consumed samples: 724992 | consumed tokens: 1484783616 | elapsed time per iteration (s): 141.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.939887E-01 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.487 | TFLOPs: 147.89 | [default7]: iteration 355/ 3100 | consumed samples: 727040 | consumed tokens: 1488977920 | elapsed time per iteration (s): 141.19 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.995186E-01 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.505 | TFLOPs: 148.08 | [default7]: iteration 356/ 3100 | consumed samples: 729088 | consumed tokens: 1493172224 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.971891E-01 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.98 | [default7]: iteration 357/ 3100 | consumed samples: 731136 | consumed tokens: 1497366528 | elapsed time per iteration (s): 141.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.917146E-01 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.478 | TFLOPs: 147.80 | [default7]: iteration 358/ 3100 | consumed samples: 733184 | consumed tokens: 1501560832 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.938246E-01 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 359/ 3100 | consumed samples: 735232 | consumed tokens: 1505755136 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.959404E-01 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.473 | TFLOPs: 147.74 | [default7]: iteration 360/ 3100 | consumed samples: 737280 | consumed tokens: 1509949440 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.885815E-01 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.478 | TFLOPs: 147.80 | [default7]: iteration 361/ 3100 | consumed samples: 739328 | consumed tokens: 1514143744 | elapsed time per iteration (s): 141.31 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.896601E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.493 | TFLOPs: 147.95 | [default7]: iteration 362/ 3100 | consumed samples: 741376 | consumed tokens: 1518338048 | elapsed time per iteration (s): 141.57 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.889599E-01 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.466 | TFLOPs: 147.68 | [default7]: iteration 363/ 3100 | consumed samples: 743424 | consumed tokens: 1522532352 | elapsed time per iteration (s): 141.22 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.888181E-01 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.502 | TFLOPs: 148.05 | [default7]: iteration 364/ 3100 | consumed samples: 745472 | consumed tokens: 1526726656 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.904574E-01 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.485 | TFLOPs: 147.87 | [default7]: iteration 365/ 3100 | consumed samples: 747520 | consumed tokens: 1530920960 | elapsed time per iteration (s): 141.95 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.826552E-01 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.428 | TFLOPs: 147.29 | [default7]: iteration 366/ 3100 | consumed samples: 749568 | consumed tokens: 1535115264 | elapsed time per iteration (s): 141.95 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.873802E-01 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.428 | TFLOPs: 147.29 | [default7]: iteration 367/ 3100 | consumed samples: 751616 | consumed tokens: 1539309568 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.819109E-01 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.98 | [default7]: iteration 368/ 3100 | consumed samples: 753664 | consumed tokens: 1543503872 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.934603E-01 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.74 | [default7]: iteration 369/ 3100 | consumed samples: 755712 | consumed tokens: 1547698176 | elapsed time per iteration (s): 142.80 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.856543E-01 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.341 | TFLOPs: 146.40 | [default7]: iteration 370/ 3100 | consumed samples: 757760 | consumed tokens: 1551892480 | elapsed time per iteration (s): 142.54 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.872048E-01 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.368 | TFLOPs: 146.68 | [default7]: iteration 371/ 3100 | consumed samples: 759808 | consumed tokens: 1556086784 | elapsed time per iteration (s): 140.93 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.857621E-01 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.532 | TFLOPs: 148.35 | [default7]: iteration 372/ 3100 | consumed samples: 761856 | consumed tokens: 1560281088 | elapsed time per iteration (s): 142.87 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.911130E-01 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.335 | TFLOPs: 146.34 | [default7]: iteration 373/ 3100 | consumed samples: 763904 | consumed tokens: 1564475392 | elapsed time per iteration (s): 141.77 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.784376E-01 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.446 | TFLOPs: 147.47 | [default7]: iteration 374/ 3100 | consumed samples: 765952 | consumed tokens: 1568669696 | elapsed time per iteration (s): 141.87 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.948165E-01 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.436 | TFLOPs: 147.37 | [default7]: iteration 375/ 3100 | consumed samples: 768000 | consumed tokens: 1572864000 | elapsed time per iteration (s): 143.20 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.777077E-01 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.302 | TFLOPs: 146.00 | [default7]: iteration 376/ 3100 | consumed samples: 770048 | consumed tokens: 1577058304 | elapsed time per iteration (s): 142.71 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.938125E-01 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.351 | TFLOPs: 146.50 | [default7]: iteration 377/ 3100 | consumed samples: 772096 | consumed tokens: 1581252608 | elapsed time per iteration (s): 141.72 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.871764E-01 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.451 | TFLOPs: 147.52 | [default7]: iteration 378/ 3100 | consumed samples: 774144 | consumed tokens: 1585446912 | elapsed time per iteration (s): 141.90 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.869160E-01 | grad norm: 1.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.432 | TFLOPs: 147.33 | [default7]: iteration 379/ 3100 | consumed samples: 776192 | consumed tokens: 1589641216 | elapsed time per iteration (s): 142.90 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.822022E-01 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.331 | TFLOPs: 146.30 | [default7]: iteration 380/ 3100 | consumed samples: 778240 | consumed tokens: 1593835520 | elapsed time per iteration (s): 143.53 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.875823E-01 | grad norm: 1.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.269 | TFLOPs: 145.66 | [default7]: iteration 381/ 3100 | consumed samples: 780288 | consumed tokens: 1598029824 | elapsed time per iteration (s): 142.84 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.839366E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.337 | TFLOPs: 146.36 | [default7]: iteration 382/ 3100 | consumed samples: 782336 | consumed tokens: 1602224128 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.862720E-01 | grad norm: 2.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 383/ 3100 | consumed samples: 784384 | consumed tokens: 1606418432 | elapsed time per iteration (s): 142.78 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.963805E-01 | grad norm: 0.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.343 | TFLOPs: 146.42 | [default7]: iteration 384/ 3100 | consumed samples: 786432 | consumed tokens: 1610612736 | elapsed time per iteration (s): 143.50 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.826634E-01 | grad norm: 0.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.272 | TFLOPs: 145.70 | [default7]: iteration 385/ 3100 | consumed samples: 788480 | consumed tokens: 1614807040 | elapsed time per iteration (s): 142.94 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.846268E-01 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.327 | TFLOPs: 146.26 | [default7]: iteration 386/ 3100 | consumed samples: 790528 | consumed tokens: 1619001344 | elapsed time per iteration (s): 143.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.850324E-01 | grad norm: 0.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.296 | TFLOPs: 145.94 | [default7]: iteration 387/ 3100 | consumed samples: 792576 | consumed tokens: 1623195648 | elapsed time per iteration (s): 143.66 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.794564E-01 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.256 | TFLOPs: 145.53 | [default7]: iteration 388/ 3100 | consumed samples: 794624 | consumed tokens: 1627389952 | elapsed time per iteration (s): 142.52 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.774528E-01 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.370 | TFLOPs: 146.70 | [default7]: iteration 389/ 3100 | consumed samples: 796672 | consumed tokens: 1631584256 | elapsed time per iteration (s): 142.72 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.901411E-01 | grad norm: 0.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.350 | TFLOPs: 146.49 | [default7]: iteration 390/ 3100 | consumed samples: 798720 | consumed tokens: 1635778560 | elapsed time per iteration (s): 142.22 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.815674E-01 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.401 | TFLOPs: 147.01 | [default7]: iteration 391/ 3100 | consumed samples: 800768 | consumed tokens: 1639972864 | elapsed time per iteration (s): 142.48 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.988052E-01 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.374 | TFLOPs: 146.73 | [default7]: iteration 392/ 3100 | consumed samples: 802816 | consumed tokens: 1644167168 | elapsed time per iteration (s): 142.20 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.706589E-01 | grad norm: 0.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.402 | TFLOPs: 147.02 | [default7]: iteration 393/ 3100 | consumed samples: 804864 | consumed tokens: 1648361472 | elapsed time per iteration (s): 142.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.794055E-01 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.380 | TFLOPs: 146.79 | [default7]: iteration 394/ 3100 | consumed samples: 806912 | consumed tokens: 1652555776 | elapsed time per iteration (s): 140.99 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.834093E-01 | grad norm: 0.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.526 | TFLOPs: 148.28 | [default7]: iteration 395/ 3100 | consumed samples: 808960 | consumed tokens: 1656750080 | elapsed time per iteration (s): 141.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.804579E-01 | grad norm: 0.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.484 | TFLOPs: 147.86 | [default7]: iteration 396/ 3100 | consumed samples: 811008 | consumed tokens: 1660944384 | elapsed time per iteration (s): 141.83 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.818128E-01 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.439 | TFLOPs: 147.40 | [default7]: iteration 397/ 3100 | consumed samples: 813056 | consumed tokens: 1665138688 | elapsed time per iteration (s): 142.62 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.799085E-01 | grad norm: 0.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.359 | TFLOPs: 146.59 | [default7]: iteration 398/ 3100 | consumed samples: 815104 | consumed tokens: 1669332992 | elapsed time per iteration (s): 141.24 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.830008E-01 | grad norm: 0.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.500 | TFLOPs: 148.03 | [default7]: iteration 399/ 3100 | consumed samples: 817152 | consumed tokens: 1673527296 | elapsed time per iteration (s): 142.72 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.816252E-01 | grad norm: 0.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.350 | TFLOPs: 146.49 | [default7]: iteration 400/ 3100 | consumed samples: 819200 | consumed tokens: 1677721600 | elapsed time per iteration (s): 141.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.716454E-01 | grad norm: 0.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.483 | TFLOPs: 147.85 | [default7]: iteration 401/ 3100 | consumed samples: 821248 | consumed tokens: 1681915904 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.777643E-01 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.98 | [default7]: iteration 402/ 3100 | consumed samples: 823296 | consumed tokens: 1686110208 | elapsed time per iteration (s): 141.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.740953E-01 | grad norm: 0.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.478 | TFLOPs: 147.80 | [default7]: iteration 403/ 3100 | consumed samples: 825344 | consumed tokens: 1690304512 | elapsed time per iteration (s): 142.67 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.761290E-01 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.354 | TFLOPs: 146.54 | [default7]: iteration 404/ 3100 | consumed samples: 827392 | consumed tokens: 1694498816 | elapsed time per iteration (s): 142.20 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.814339E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.402 | TFLOPs: 147.02 | [default7]: iteration 405/ 3100 | consumed samples: 829440 | consumed tokens: 1698693120 | elapsed time per iteration (s): 142.60 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.790276E-01 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.362 | TFLOPs: 146.62 | [default7]: iteration 406/ 3100 | consumed samples: 831488 | consumed tokens: 1702887424 | elapsed time per iteration (s): 141.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.797514E-01 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.486 | TFLOPs: 147.88 | [default7]: iteration 407/ 3100 | consumed samples: 833536 | consumed tokens: 1707081728 | elapsed time per iteration (s): 142.10 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.789038E-01 | grad norm: 0.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.412 | TFLOPs: 147.13 | [default7]: iteration 408/ 3100 | consumed samples: 835584 | consumed tokens: 1711276032 | elapsed time per iteration (s): 143.15 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.767972E-01 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.306 | TFLOPs: 146.05 | [default7]: iteration 409/ 3100 | consumed samples: 837632 | consumed tokens: 1715470336 | elapsed time per iteration (s): 143.01 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.806881E-01 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.321 | TFLOPs: 146.19 | [default7]: iteration 410/ 3100 | consumed samples: 839680 | consumed tokens: 1719664640 | elapsed time per iteration (s): 143.06 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.739011E-01 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.316 | TFLOPs: 146.14 | [default7]: iteration 411/ 3100 | consumed samples: 841728 | consumed tokens: 1723858944 | elapsed time per iteration (s): 141.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.762956E-01 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.489 | TFLOPs: 147.91 | [default7]: iteration 412/ 3100 | consumed samples: 843776 | consumed tokens: 1728053248 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.765112E-01 | grad norm: 0.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.477 | TFLOPs: 147.79 | [default7]: iteration 413/ 3100 | consumed samples: 845824 | consumed tokens: 1732247552 | elapsed time per iteration (s): 141.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.760707E-01 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.497 | TFLOPs: 147.99 | [default7]: iteration 414/ 3100 | consumed samples: 847872 | consumed tokens: 1736441856 | elapsed time per iteration (s): 141.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.761492E-01 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.497 | TFLOPs: 147.99 | [default7]: iteration 415/ 3100 | consumed samples: 849920 | consumed tokens: 1740636160 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.839925E-01 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.497 | TFLOPs: 147.99 | [default7]: iteration 416/ 3100 | consumed samples: 851968 | consumed tokens: 1744830464 | elapsed time per iteration (s): 141.12 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.758605E-01 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.512 | TFLOPs: 148.15 | [default7]: iteration 417/ 3100 | consumed samples: 854016 | consumed tokens: 1749024768 | elapsed time per iteration (s): 141.12 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.759301E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.513 | TFLOPs: 148.15 | [default7]: iteration 418/ 3100 | consumed samples: 856064 | consumed tokens: 1753219072 | elapsed time per iteration (s): 141.66 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.675687E-01 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.457 | TFLOPs: 147.59 | [default7]: iteration 419/ 3100 | consumed samples: 858112 | consumed tokens: 1757413376 | elapsed time per iteration (s): 140.91 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.665114E-01 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.534 | TFLOPs: 148.37 | [default7]: iteration 420/ 3100 | consumed samples: 860160 | consumed tokens: 1761607680 | elapsed time per iteration (s): 141.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.805221E-01 | grad norm: 0.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.501 | TFLOPs: 148.04 | [default7]: iteration 421/ 3100 | consumed samples: 862208 | consumed tokens: 1765801984 | elapsed time per iteration (s): 142.20 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.746015E-01 | grad norm: 0.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.403 | TFLOPs: 147.03 | [default7]: iteration 422/ 3100 | consumed samples: 864256 | consumed tokens: 1769996288 | elapsed time per iteration (s): 141.26 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.800716E-01 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.498 | TFLOPs: 148.01 | [default7]: iteration 423/ 3100 | consumed samples: 866304 | consumed tokens: 1774190592 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.782013E-01 | grad norm: 0.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.481 | TFLOPs: 147.83 | [default7]: iteration 424/ 3100 | consumed samples: 868352 | consumed tokens: 1778384896 | elapsed time per iteration (s): 142.92 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.589020E-01 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.330 | TFLOPs: 146.29 | [default7]: iteration 425/ 3100 | consumed samples: 870400 | consumed tokens: 1782579200 | elapsed time per iteration (s): 141.36 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.774007E-01 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.488 | TFLOPs: 147.90 | [default7]: iteration 426/ 3100 | consumed samples: 872448 | consumed tokens: 1786773504 | elapsed time per iteration (s): 141.17 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.700762E-01 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.507 | TFLOPs: 148.10 | [default7]: iteration 427/ 3100 | consumed samples: 874496 | consumed tokens: 1790967808 | elapsed time per iteration (s): 141.22 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.630461E-01 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.502 | TFLOPs: 148.05 | [default7]: iteration 428/ 3100 | consumed samples: 876544 | consumed tokens: 1795162112 | elapsed time per iteration (s): 140.98 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.650511E-01 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.527 | TFLOPs: 148.30 | [default7]: iteration 429/ 3100 | consumed samples: 878592 | consumed tokens: 1799356416 | elapsed time per iteration (s): 141.54 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.669528E-01 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.470 | TFLOPs: 147.71 | [default7]: iteration 430/ 3100 | consumed samples: 880640 | consumed tokens: 1803550720 | elapsed time per iteration (s): 141.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.684170E-01 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.492 | TFLOPs: 147.94 | [default7]: iteration 431/ 3100 | consumed samples: 882688 | consumed tokens: 1807745024 | elapsed time per iteration (s): 140.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.777127E-01 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.538 | TFLOPs: 148.41 | [default7]: iteration 432/ 3100 | consumed samples: 884736 | consumed tokens: 1811939328 | elapsed time per iteration (s): 141.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.720929E-01 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.479 | TFLOPs: 147.81 | [default7]: iteration 433/ 3100 | consumed samples: 886784 | consumed tokens: 1816133632 | elapsed time per iteration (s): 141.31 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.641888E-01 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.493 | TFLOPs: 147.96 | [default7]: iteration 434/ 3100 | consumed samples: 888832 | consumed tokens: 1820327936 | elapsed time per iteration (s): 142.59 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.760417E-01 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.363 | TFLOPs: 146.62 | [default7]: iteration 435/ 3100 | consumed samples: 890880 | consumed tokens: 1824522240 | elapsed time per iteration (s): 141.13 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.743129E-01 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.511 | TFLOPs: 148.14 | [default7]: iteration 436/ 3100 | consumed samples: 892928 | consumed tokens: 1828716544 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.695344E-01 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.01 | [default7]: iteration 437/ 3100 | consumed samples: 894976 | consumed tokens: 1832910848 | elapsed time per iteration (s): 141.36 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.781709E-01 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.488 | TFLOPs: 147.90 | [default7]: iteration 438/ 3100 | consumed samples: 897024 | consumed tokens: 1837105152 | elapsed time per iteration (s): 141.63 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.610411E-01 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.460 | TFLOPs: 147.61 | [default7]: iteration 439/ 3100 | consumed samples: 899072 | consumed tokens: 1841299456 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.625028E-01 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.478 | TFLOPs: 147.80 | [default7]: iteration 440/ 3100 | consumed samples: 901120 | consumed tokens: 1845493760 | elapsed time per iteration (s): 142.08 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.677552E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.415 | TFLOPs: 147.15 | [default7]: iteration 441/ 3100 | consumed samples: 903168 | consumed tokens: 1849688064 | elapsed time per iteration (s): 141.74 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.698222E-01 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.449 | TFLOPs: 147.50 | [default7]: iteration 442/ 3100 | consumed samples: 905216 | consumed tokens: 1853882368 | elapsed time per iteration (s): 142.48 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.652297E-01 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.374 | TFLOPs: 146.73 | [default7]: iteration 443/ 3100 | consumed samples: 907264 | consumed tokens: 1858076672 | elapsed time per iteration (s): 142.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.663027E-01 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.382 | TFLOPs: 146.81 | [default7]: iteration 444/ 3100 | consumed samples: 909312 | consumed tokens: 1862270976 | elapsed time per iteration (s): 141.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.597082E-01 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.497 | TFLOPs: 147.99 | [default7]: iteration 445/ 3100 | consumed samples: 911360 | consumed tokens: 1866465280 | elapsed time per iteration (s): 142.07 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.615746E-01 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.415 | TFLOPs: 147.16 | [default7]: iteration 446/ 3100 | consumed samples: 913408 | consumed tokens: 1870659584 | elapsed time per iteration (s): 142.18 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.725365E-01 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.404 | TFLOPs: 147.05 | [default7]: iteration 447/ 3100 | consumed samples: 915456 | consumed tokens: 1874853888 | elapsed time per iteration (s): 141.54 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.641551E-01 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.470 | TFLOPs: 147.71 | [default7]: iteration 448/ 3100 | consumed samples: 917504 | consumed tokens: 1879048192 | elapsed time per iteration (s): 142.53 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.647537E-01 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.369 | TFLOPs: 146.68 | [default7]: iteration 449/ 3100 | consumed samples: 919552 | consumed tokens: 1883242496 | elapsed time per iteration (s): 141.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.552919E-01 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.483 | TFLOPs: 147.84 | [default7]: iteration 450/ 3100 | consumed samples: 921600 | consumed tokens: 1887436800 | elapsed time per iteration (s): 141.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.637735E-01 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.483 | TFLOPs: 147.85 | [default7]: iteration 451/ 3100 | consumed samples: 923648 | consumed tokens: 1891631104 | elapsed time per iteration (s): 141.48 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.539033E-01 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.475 | TFLOPs: 147.77 | [default7]: iteration 452/ 3100 | consumed samples: 925696 | consumed tokens: 1895825408 | elapsed time per iteration (s): 141.55 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.654511E-01 | grad norm: 0.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.469 | TFLOPs: 147.70 | [default7]: iteration 453/ 3100 | consumed samples: 927744 | consumed tokens: 1900019712 | elapsed time per iteration (s): 141.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.595980E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.501 | TFLOPs: 148.03 | [default7]: iteration 454/ 3100 | consumed samples: 929792 | consumed tokens: 1904214016 | elapsed time per iteration (s): 141.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.649406E-01 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.489 | TFLOPs: 147.91 | [default7]: iteration 455/ 3100 | consumed samples: 931840 | consumed tokens: 1908408320 | elapsed time per iteration (s): 140.98 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.582982E-01 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.526 | TFLOPs: 148.29 | [default7]: iteration 456/ 3100 | consumed samples: 933888 | consumed tokens: 1912602624 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.701876E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.82 | [default7]: iteration 457/ 3100 | consumed samples: 935936 | consumed tokens: 1916796928 | elapsed time per iteration (s): 141.48 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.577562E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.476 | TFLOPs: 147.78 | [default7]: iteration 458/ 3100 | consumed samples: 937984 | consumed tokens: 1920991232 | elapsed time per iteration (s): 141.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.552910E-01 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 459/ 3100 | consumed samples: 940032 | consumed tokens: 1925185536 | elapsed time per iteration (s): 141.67 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.555737E-01 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.456 | TFLOPs: 147.57 | [default7]: iteration 460/ 3100 | consumed samples: 942080 | consumed tokens: 1929379840 | elapsed time per iteration (s): 142.17 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.569854E-01 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.405 | TFLOPs: 147.05 | [default7]: iteration 461/ 3100 | consumed samples: 944128 | consumed tokens: 1933574144 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.593604E-01 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.484 | TFLOPs: 147.86 | [default7]: iteration 462/ 3100 | consumed samples: 946176 | consumed tokens: 1937768448 | elapsed time per iteration (s): 141.44 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.613280E-01 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.82 | [default7]: iteration 463/ 3100 | consumed samples: 948224 | consumed tokens: 1941962752 | elapsed time per iteration (s): 141.13 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.531106E-01 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.512 | TFLOPs: 148.14 | [default7]: iteration 464/ 3100 | consumed samples: 950272 | consumed tokens: 1946157056 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.668794E-01 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 465/ 3100 | consumed samples: 952320 | consumed tokens: 1950351360 | elapsed time per iteration (s): 141.19 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.638505E-01 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.506 | TFLOPs: 148.08 | [default7]: iteration 466/ 3100 | consumed samples: 954368 | consumed tokens: 1954545664 | elapsed time per iteration (s): 142.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.535510E-01 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.383 | TFLOPs: 146.83 | [default7]: iteration 467/ 3100 | consumed samples: 956416 | consumed tokens: 1958739968 | elapsed time per iteration (s): 140.95 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.663616E-01 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.530 | TFLOPs: 148.33 | [default7]: iteration 468/ 3100 | consumed samples: 958464 | consumed tokens: 1962934272 | elapsed time per iteration (s): 141.02 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.597095E-01 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.523 | TFLOPs: 148.26 | [default7]: iteration 469/ 3100 | consumed samples: 960512 | consumed tokens: 1967128576 | elapsed time per iteration (s): 141.54 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.594245E-01 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.470 | TFLOPs: 147.71 | [default7]: iteration 470/ 3100 | consumed samples: 962560 | consumed tokens: 1971322880 | elapsed time per iteration (s): 141.50 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.761634E-01 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.473 | TFLOPs: 147.75 | [default7]: iteration 471/ 3100 | consumed samples: 964608 | consumed tokens: 1975517184 | elapsed time per iteration (s): 142.15 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.589365E-01 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.407 | TFLOPs: 147.07 | [default7]: iteration 472/ 3100 | consumed samples: 966656 | consumed tokens: 1979711488 | elapsed time per iteration (s): 141.15 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.567084E-01 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.509 | TFLOPs: 148.12 | [default7]: iteration 473/ 3100 | consumed samples: 968704 | consumed tokens: 1983905792 | elapsed time per iteration (s): 141.64 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.469390E-01 | grad norm: 1.912 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.459 | TFLOPs: 147.61 | [default7]: iteration 474/ 3100 | consumed samples: 970752 | consumed tokens: 1988100096 | elapsed time per iteration (s): 141.10 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.609930E-01 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.515 | TFLOPs: 148.17 | [default7]: iteration 475/ 3100 | consumed samples: 972800 | consumed tokens: 1992294400 | elapsed time per iteration (s): 141.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.676587E-01 | grad norm: 0.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.497 | TFLOPs: 147.99 | [default7]: iteration 476/ 3100 | consumed samples: 974848 | consumed tokens: 1996488704 | elapsed time per iteration (s): 141.22 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.525621E-01 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.502 | TFLOPs: 148.04 | [default7]: iteration 477/ 3100 | consumed samples: 976896 | consumed tokens: 2000683008 | elapsed time per iteration (s): 142.08 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.639950E-01 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.415 | TFLOPs: 147.15 | [default7]: iteration 478/ 3100 | consumed samples: 978944 | consumed tokens: 2004877312 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.488777E-01 | grad norm: 0.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.99 | [default7]: iteration 479/ 3100 | consumed samples: 980992 | consumed tokens: 2009071616 | elapsed time per iteration (s): 141.80 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.563019E-01 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.443 | TFLOPs: 147.44 | [default7]: iteration 480/ 3100 | consumed samples: 983040 | consumed tokens: 2013265920 | elapsed time per iteration (s): 142.63 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.498606E-01 | grad norm: 0.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.359 | TFLOPs: 146.58 | [default7]: iteration 481/ 3100 | consumed samples: 985088 | consumed tokens: 2017460224 | elapsed time per iteration (s): 141.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.494179E-01 | grad norm: 0.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.492 | TFLOPs: 147.94 | [default7]: iteration 482/ 3100 | consumed samples: 987136 | consumed tokens: 2021654528 | elapsed time per iteration (s): 141.12 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.590105E-01 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.513 | TFLOPs: 148.15 | [default7]: iteration 483/ 3100 | consumed samples: 989184 | consumed tokens: 2025848832 | elapsed time per iteration (s): 142.71 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.559749E-01 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.351 | TFLOPs: 146.50 | [default7]: iteration 484/ 3100 | consumed samples: 991232 | consumed tokens: 2030043136 | elapsed time per iteration (s): 142.11 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.445537E-01 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.412 | TFLOPs: 147.12 | [default7]: iteration 485/ 3100 | consumed samples: 993280 | consumed tokens: 2034237440 | elapsed time per iteration (s): 141.95 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.527385E-01 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.428 | TFLOPs: 147.29 | [default7]: iteration 486/ 3100 | consumed samples: 995328 | consumed tokens: 2038431744 | elapsed time per iteration (s): 141.85 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.515503E-01 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.438 | TFLOPs: 147.39 | [default7]: iteration 487/ 3100 | consumed samples: 997376 | consumed tokens: 2042626048 | elapsed time per iteration (s): 143.03 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.491252E-01 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.318 | TFLOPs: 146.17 | [default7]: iteration 488/ 3100 | consumed samples: 999424 | consumed tokens: 2046820352 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.500473E-01 | grad norm: 0.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.481 | TFLOPs: 147.83 | [default7]: iteration 489/ 3100 | consumed samples: 1001472 | consumed tokens: 2051014656 | elapsed time per iteration (s): 141.89 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.558899E-01 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.434 | TFLOPs: 147.35 | [default7]: iteration 490/ 3100 | consumed samples: 1003520 | consumed tokens: 2055208960 | elapsed time per iteration (s): 142.44 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.500557E-01 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.378 | TFLOPs: 146.77 | [default7]: iteration 491/ 3100 | consumed samples: 1005568 | consumed tokens: 2059403264 | elapsed time per iteration (s): 141.75 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.549245E-01 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.448 | TFLOPs: 147.49 | [default7]: iteration 492/ 3100 | consumed samples: 1007616 | consumed tokens: 2063597568 | elapsed time per iteration (s): 142.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.607004E-01 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.398 | TFLOPs: 146.98 | [default7]: iteration 493/ 3100 | consumed samples: 1009664 | consumed tokens: 2067791872 | elapsed time per iteration (s): 142.16 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.521153E-01 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.406 | TFLOPs: 147.06 | [default7]: iteration 494/ 3100 | consumed samples: 1011712 | consumed tokens: 2071986176 | elapsed time per iteration (s): 141.96 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.503247E-01 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.426 | TFLOPs: 147.27 | [default7]: iteration 495/ 3100 | consumed samples: 1013760 | consumed tokens: 2076180480 | elapsed time per iteration (s): 143.15 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.413254E-01 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.307 | TFLOPs: 146.05 | [default7]: iteration 496/ 3100 | consumed samples: 1015808 | consumed tokens: 2080374784 | elapsed time per iteration (s): 142.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.462809E-01 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.387 | TFLOPs: 146.87 | [default7]: iteration 497/ 3100 | consumed samples: 1017856 | consumed tokens: 2084569088 | elapsed time per iteration (s): 142.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.481552E-01 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.390 | TFLOPs: 146.90 | [default7]: iteration 498/ 3100 | consumed samples: 1019904 | consumed tokens: 2088763392 | elapsed time per iteration (s): 141.50 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.534266E-01 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.473 | TFLOPs: 147.75 | [default0]:saving checkpoint at iteration 498 to /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]:[2022-09-04 15:32:41,968] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step498 is begin to save! [default0]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_52-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,082] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_09-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,090] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_54-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_35-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_45-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_61-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_58-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_16-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,097] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_47-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_10-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_60-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_19-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,082] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_20-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_25-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_04-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,073] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_71_model_states.pt... [default4]:[2022-09-04 15:32:42,078] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_71_model_states.pt. [default4]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_53-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,082] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_21-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,089] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_30-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,097] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_24-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,082] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_66-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_11-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_38-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_01-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,079] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_72-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_37-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,082] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_48-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,082] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_15-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,082] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_49-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,082] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_67-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_13-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_55-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_07-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_39-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,094] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_69-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,094] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_43-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_36-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_65-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,089] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_05-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_12-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_44-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,090] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_03-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_57-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_32-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,082] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_41-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_59-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_08-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_40-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,079] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_70-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_50-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_06-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_56-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_27-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_33-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_28-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,072] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_14-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,082] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_63-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_17-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_22-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,094] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_68-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,079] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_71-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,094] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_42-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_26-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,089] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_23-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,081] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_62-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_34-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,089] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_31-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_51-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_64-model_00-model_states.pt... [default4]:[2022-09-04 15:32:42,089] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_29-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_18-model_00-model_states.pt... [default0]:[2022-09-04 15:32:42,089] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_46-model_00-model_states.pt... [default0]:[2022-09-04 15:32:45,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_72-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,082] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_74-model_00-model_states.pt... [default0]:[2022-09-04 15:32:45,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_74-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,088] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_70_model_states.pt... [default0]:[2022-09-04 15:32:45,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_70_model_states.pt. [default0]:[2022-09-04 15:32:45,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_28-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,246] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_26_model_states.pt... [default0]:[2022-09-04 15:32:45,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_26_model_states.pt. [default0]:[2022-09-04 15:32:45,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_22-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,220] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_20_model_states.pt... [default0]:[2022-09-04 15:32:45,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_20_model_states.pt. [default4]:[2022-09-04 15:32:45,347] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_41-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,348] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_39_model_states.pt... [default4]:[2022-09-04 15:32:45,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_39_model_states.pt. [default0]:[2022-09-04 15:32:45,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_34-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,307] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_32_model_states.pt... [default0]:[2022-09-04 15:32:45,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_32_model_states.pt. [default0]:[2022-09-04 15:32:45,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_04-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,394] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_02_model_states.pt... [default0]:[2022-09-04 15:32:45,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_02_model_states.pt. [default0]:[2022-09-04 15:32:45,459] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_68-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_66_model_states.pt... [default0]:[2022-09-04 15:32:45,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_66_model_states.pt. [default4]:[2022-09-04 15:32:45,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_25-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_23_model_states.pt... [default4]:[2022-09-04 15:32:45,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_23_model_states.pt. [default0]:[2022-09-04 15:32:45,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_40-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_38_model_states.pt... [default0]:[2022-09-04 15:32:45,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_38_model_states.pt. [default4]:[2022-09-04 15:32:45,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_23-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,543] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_21_model_states.pt... [default4]:[2022-09-04 15:32:45,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_21_model_states.pt. [default4]:[2022-09-04 15:32:45,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_45-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,518] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_43_model_states.pt... [default4]:[2022-09-04 15:32:45,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_43_model_states.pt. [default0]:[2022-09-04 15:32:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_16-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_14_model_states.pt... [default0]:[2022-09-04 15:32:45,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_14_model_states.pt. [default0]:[2022-09-04 15:32:45,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_20-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,512] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_18_model_states.pt... [default0]:[2022-09-04 15:32:45,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_18_model_states.pt. [default0]:[2022-09-04 15:32:45,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_24-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,554] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_22_model_states.pt... [default0]:[2022-09-04 15:32:45,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_22_model_states.pt. [default4]:[2022-09-04 15:32:45,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_15-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_13_model_states.pt... [default4]:[2022-09-04 15:32:45,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_13-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,608] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_11_model_states.pt... [default4]:[2022-09-04 15:32:45,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_11_model_states.pt. [default0]:[2022-09-04 15:32:45,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_50-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,604] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_48_model_states.pt... [default0]:[2022-09-04 15:32:45,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_48_model_states.pt. [default4]:[2022-09-04 15:32:45,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_33-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,563] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_31_model_states.pt... [default4]:[2022-09-04 15:32:45,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_31_model_states.pt. [default0]:[2022-09-04 15:32:45,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_06-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,590] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_04_model_states.pt... [default0]:[2022-09-04 15:32:45,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_04_model_states.pt. [default0]:[2022-09-04 15:32:45,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_14-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_12_model_states.pt... [default0]:[2022-09-04 15:32:45,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_12_model_states.pt. [default4]:[2022-09-04 15:32:45,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_17-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,662] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_15_model_states.pt... [default4]:[2022-09-04 15:32:45,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_15_model_states.pt. [default4]:[2022-09-04 15:32:45,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_29-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_27_model_states.pt... [default4]:[2022-09-04 15:32:45,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_27_model_states.pt. [default4]:[2022-09-04 15:32:45,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_35-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,684] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_33_model_states.pt... [default4]:[2022-09-04 15:32:45,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_33_model_states.pt. [default4]:[2022-09-04 15:32:45,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_61-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,602] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_59_model_states.pt... [default4]:[2022-09-04 15:32:45,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_59_model_states.pt. [default4]:[2022-09-04 15:32:45,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_11-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,718] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_09_model_states.pt... [default4]:[2022-09-04 15:32:45,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_13_model_states.pt. [default4]:[2022-09-04 15:32:45,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_55-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,677] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_53_model_states.pt... [default4]:[2022-09-04 15:32:45,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_53_model_states.pt. [default4]:[2022-09-04 15:32:45,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_07-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,678] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_05_model_states.pt... [default4]:[2022-09-04 15:32:45,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_05_model_states.pt. [default4]:[2022-09-04 15:32:45,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_69-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,681] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_67_model_states.pt... [default4]:[2022-09-04 15:32:45,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_67_model_states.pt. [default0]:[2022-09-04 15:32:45,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_44-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,688] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_42_model_states.pt... [default0]:[2022-09-04 15:32:45,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_42_model_states.pt. [default4]:[2022-09-04 15:32:45,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_03-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,666] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_01_model_states.pt... [default4]:[2022-09-04 15:32:45,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_01_model_states.pt. [default0]:[2022-09-04 15:32:45,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_08-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_06_model_states.pt... [default0]:[2022-09-04 15:32:45,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_06_model_states.pt. [default0]:[2022-09-04 15:32:45,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_54-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_52_model_states.pt... [default0]:[2022-09-04 15:32:45,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_52_model_states.pt. [default0]:[2022-09-04 15:32:45,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_60-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,758] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_58_model_states.pt... [default0]:[2022-09-04 15:32:45,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_58_model_states.pt. [default0]:[2022-09-04 15:32:45,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_10-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,767] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_08_model_states.pt... [default0]:[2022-09-04 15:32:45,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_08_model_states.pt. [default4]:[2022-09-04 15:32:45,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_21-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,727] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_19_model_states.pt... [default4]:[2022-09-04 15:32:45,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_19_model_states.pt. [default0]:[2022-09-04 15:32:45,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_66-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,782] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_64_model_states.pt... [default0]:[2022-09-04 15:32:45,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_64_model_states.pt. [default4]:[2022-09-04 15:32:45,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_09_model_states.pt. [default4]:[2022-09-04 15:32:45,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_37-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_35_model_states.pt... [default4]:[2022-09-04 15:32:45,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_35_model_states.pt. [default0]:[2022-09-04 15:32:45,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_36-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,753] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_34_model_states.pt... [default0]:[2022-09-04 15:32:45,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_34_model_states.pt. [default4]:[2022-09-04 15:32:45,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_65-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_63_model_states.pt... [default4]:[2022-09-04 15:32:45,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_63_model_states.pt. [default0]:[2022-09-04 15:32:45,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_12-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_10_model_states.pt... [default0]:[2022-09-04 15:32:45,777] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_10_model_states.pt. [default0]:[2022-09-04 15:32:45,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_32-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,823] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_30_model_states.pt... [default0]:[2022-09-04 15:32:45,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_30_model_states.pt. [default4]:[2022-09-04 15:32:45,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_27-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_25_model_states.pt... [default4]:[2022-09-04 15:32:45,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_25_model_states.pt. [default0]:[2022-09-04 15:32:45,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_26-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,855] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_24_model_states.pt... [default0]:[2022-09-04 15:32:45,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_24_model_states.pt. [default4]:[2022-09-04 15:32:45,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_51-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,800] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_49_model_states.pt... [default4]:[2022-09-04 15:32:45,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_49_model_states.pt. [default0]:[2022-09-04 15:32:45,838] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_64-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_62_model_states.pt... [default0]:[2022-09-04 15:32:45,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_62_model_states.pt. [default0]:[2022-09-04 15:32:45,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_46-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,882] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_44_model_states.pt... [default0]:[2022-09-04 15:32:45,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_44_model_states.pt. [default4]:[2022-09-04 15:32:45,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_09-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,886] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_07_model_states.pt... [default4]:[2022-09-04 15:32:45,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_07_model_states.pt. [default4]:[2022-09-04 15:32:45,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_47-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,890] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_45_model_states.pt... [default4]:[2022-09-04 15:32:45,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_45_model_states.pt. [default4]:[2022-09-04 15:32:45,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_53-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,876] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_51_model_states.pt... [default4]:[2022-09-04 15:32:45,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_51_model_states.pt. [default0]:[2022-09-04 15:32:45,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_30-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,857] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_28_model_states.pt... [default0]:[2022-09-04 15:32:45,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_28_model_states.pt. [default0]:[2022-09-04 15:32:45,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_48-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,828] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_46_model_states.pt... [default0]:[2022-09-04 15:32:45,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_46_model_states.pt. [default4]:[2022-09-04 15:32:45,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_49-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,894] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_47_model_states.pt... [default4]:[2022-09-04 15:32:45,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_47_model_states.pt. [default4]:[2022-09-04 15:32:45,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_67-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,850] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_65_model_states.pt... [default4]:[2022-09-04 15:32:45,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_65_model_states.pt. [default4]:[2022-09-04 15:32:45,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_43-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,882] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_41_model_states.pt... [default4]:[2022-09-04 15:32:45,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_41_model_states.pt. [default4]:[2022-09-04 15:32:45,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_05-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_03_model_states.pt... [default4]:[2022-09-04 15:32:45,922] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_03_model_states.pt. [default0]:[2022-09-04 15:32:45,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_70-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_68_model_states.pt... [default0]:[2022-09-04 15:32:45,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_68_model_states.pt. [default4]:[2022-09-04 15:32:45,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_71-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,883] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_69_model_states.pt... [default4]:[2022-09-04 15:32:45,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_69_model_states.pt. [default0]:[2022-09-04 15:32:45,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_42-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,948] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_40_model_states.pt... [default0]:[2022-09-04 15:32:45,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_40_model_states.pt. [default0]:[2022-09-04 15:32:45,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_62-model_00-model_states.pt. [default0]:[2022-09-04 15:32:45,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_60_model_states.pt... [default0]:[2022-09-04 15:32:45,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_60_model_states.pt. [default4]:[2022-09-04 15:32:45,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_31-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_29_model_states.pt... [default4]:[2022-09-04 15:32:45,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_29_model_states.pt. [default4]:[2022-09-04 15:32:45,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_63-model_00-model_states.pt. [default4]:[2022-09-04 15:32:45,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_61_model_states.pt... [default4]:[2022-09-04 15:32:45,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_61_model_states.pt. [default0]:[2022-09-04 15:32:46,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_18-model_00-model_states.pt. [default0]:[2022-09-04 15:32:46,047] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_16_model_states.pt... [default0]:[2022-09-04 15:32:46,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_16_model_states.pt. [default0]:[2022-09-04 15:32:46,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_52-model_00-model_states.pt. [default0]:[2022-09-04 15:32:46,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_50_model_states.pt... [default0]:[2022-09-04 15:32:46,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_50_model_states.pt. [default4]:[2022-09-04 15:32:46,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_19-model_00-model_states.pt. [default4]:[2022-09-04 15:32:46,064] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_17_model_states.pt... [default4]:[2022-09-04 15:32:46,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_17_model_states.pt. [default0]:[2022-09-04 15:32:46,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_38-model_00-model_states.pt. [default0]:[2022-09-04 15:32:46,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_36_model_states.pt... [default0]:[2022-09-04 15:32:46,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_36_model_states.pt. [default4]:[2022-09-04 15:32:46,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_39-model_00-model_states.pt. [default4]:[2022-09-04 15:32:46,062] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_37_model_states.pt... [default4]:[2022-09-04 15:32:46,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_37_model_states.pt. [default0]:[2022-09-04 15:32:46,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_56-model_00-model_states.pt. [default0]:[2022-09-04 15:32:46,099] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_54_model_states.pt... [default0]:[2022-09-04 15:32:46,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_54_model_states.pt. [default4]:[2022-09-04 15:32:46,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_57-model_00-model_states.pt. [default4]:[2022-09-04 15:32:46,233] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_55_model_states.pt... [default4]:[2022-09-04 15:32:46,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_55_model_states.pt. [default0]:[2022-09-04 15:32:46,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_58-model_00-model_states.pt. [default0]:[2022-09-04 15:32:46,225] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_56_model_states.pt... [default0]:[2022-09-04 15:32:46,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_56_model_states.pt. [default4]:[2022-09-04 15:32:46,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_59-model_00-model_states.pt. [default4]:[2022-09-04 15:32:46,298] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_57_model_states.pt... [default4]:[2022-09-04 15:32:46,300] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_57_model_states.pt. [default0]:[2022-09-04 15:32:46,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/layer_01-model_00-model_states.pt. [default0]:[2022-09-04 15:32:46,867] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_00_model_states.pt [default0]:[2022-09-04 15:32:46,867] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_00_model_states.pt... [default0]:[2022-09-04 15:32:46,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/mp_rank_00_model_states.pt. [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... [default0]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt... [default7]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt... [default3]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt... [default4]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt... [default1]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt... [default6]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt... [default5]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt... [default2]:[2022-09-04 15:32:46,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt... [default2]:[2022-09-04 15:32:54,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt. [default2]:[2022-09-04 15:32:54,563] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt [default2]:[2022-09-04 15:32:54,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt. [default2]:[2022-09-04 15:32:54,678] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt [default3]:[2022-09-04 15:32:54,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt. [default3]:[2022-09-04 15:32:54,790] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt [default2]:[2022-09-04 15:32:54,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt. [default2]:[2022-09-04 15:32:54,744] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt [default0]:[2022-09-04 15:32:54,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt. [default0]:[2022-09-04 15:32:54,845] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt [default0]:[2022-09-04 15:32:54,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt. [default0]:[2022-09-04 15:32:54,871] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt [default2]:[2022-09-04 15:32:54,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt. [default2]:[2022-09-04 15:32:54,901] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt [default0]:[2022-09-04 15:32:54,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt. [default0]:[2022-09-04 15:32:54,889] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt [default2]:[2022-09-04 15:32:54,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt. [default2]:[2022-09-04 15:32:54,959] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt [default0]:[2022-09-04 15:32:54,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt. [default0]:[2022-09-04 15:32:54,966] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt [default1]:[2022-09-04 15:32:54,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt. [default1]:[2022-09-04 15:32:54,927] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt [default4]:[2022-09-04 15:32:54,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt. [default4]:[2022-09-04 15:32:54,966] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt [default2]:[2022-09-04 15:32:54,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt. [default2]:[2022-09-04 15:32:54,973] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt [default4]:[2022-09-04 15:32:54,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt. [default4]:[2022-09-04 15:32:54,958] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt [default7]:[2022-09-04 15:32:55,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt. [default7]:[2022-09-04 15:32:55,010] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt [default3]:[2022-09-04 15:32:54,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt. [default3]:[2022-09-04 15:32:54,985] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt [default1]:[2022-09-04 15:32:55,083] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt. [default1]:[2022-09-04 15:32:55,083] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt [default0]:[2022-09-04 15:32:55,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt. [default0]:[2022-09-04 15:32:55,091] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt [default6]:[2022-09-04 15:32:55,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt. [default6]:[2022-09-04 15:32:55,066] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt [default6]:[2022-09-04 15:32:55,077] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt. [default6]:[2022-09-04 15:32:55,077] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt [default6]:[2022-09-04 15:32:55,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt. [default6]:[2022-09-04 15:32:55,101] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt [default3]:[2022-09-04 15:32:55,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt. [default3]:[2022-09-04 15:32:55,130] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt [default2]:[2022-09-04 15:32:55,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt. [default2]:[2022-09-04 15:32:55,125] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt [default3]:[2022-09-04 15:32:55,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt. [default3]:[2022-09-04 15:32:55,204] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt [default1]:[2022-09-04 15:32:55,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt. [default1]:[2022-09-04 15:32:55,214] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt [default2]:[2022-09-04 15:32:55,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt. [default2]:[2022-09-04 15:32:55,231] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt [default1]:[2022-09-04 15:32:55,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt. [default1]:[2022-09-04 15:32:55,172] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt [default0]:[2022-09-04 15:32:55,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt. [default0]:[2022-09-04 15:32:55,234] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt [default2]:[2022-09-04 15:32:55,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt. [default2]:[2022-09-04 15:32:55,254] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt [default1]:[2022-09-04 15:32:55,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt. [default1]:[2022-09-04 15:32:55,351] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt [default5]:[2022-09-04 15:32:55,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt. [default5]:[2022-09-04 15:32:55,338] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt [default5]:[2022-09-04 15:32:55,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt. [default5]:[2022-09-04 15:32:55,379] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt [default0]:[2022-09-04 15:32:55,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt. [default0]:[2022-09-04 15:32:55,351] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt [default1]:[2022-09-04 15:32:55,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt. [default1]:[2022-09-04 15:32:55,358] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt [default4]:[2022-09-04 15:32:55,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt. [default4]:[2022-09-04 15:32:55,383] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt [default6]:[2022-09-04 15:32:55,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt. [default6]:[2022-09-04 15:32:55,394] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt [default5]:[2022-09-04 15:32:55,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt. [default5]:[2022-09-04 15:32:55,411] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt [default7]:[2022-09-04 15:32:55,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt. [default7]:[2022-09-04 15:32:55,408] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt [default4]:[2022-09-04 15:32:55,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt. [default4]:[2022-09-04 15:32:55,433] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt [default0]:[2022-09-04 15:32:55,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt. [default0]:[2022-09-04 15:32:55,484] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt [default7]:[2022-09-04 15:32:55,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt. [default7]:[2022-09-04 15:32:55,468] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt [default5]:[2022-09-04 15:32:55,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt. [default5]:[2022-09-04 15:32:55,562] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt [default0]:[2022-09-04 15:32:55,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt. [default0]:[2022-09-04 15:32:55,478] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt [default6]:[2022-09-04 15:32:55,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt. [default6]:[2022-09-04 15:32:55,567] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt [default4]:[2022-09-04 15:32:55,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt. [default4]:[2022-09-04 15:32:55,504] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt [default3]:[2022-09-04 15:32:55,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt. [default3]:[2022-09-04 15:32:55,539] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt [default3]:[2022-09-04 15:32:55,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt. [default3]:[2022-09-04 15:32:55,569] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt [default6]:[2022-09-04 15:32:55,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt. [default6]:[2022-09-04 15:32:55,638] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt [default5]:[2022-09-04 15:32:55,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt. [default5]:[2022-09-04 15:32:55,632] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt [default3]:[2022-09-04 15:32:55,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt. [default3]:[2022-09-04 15:32:55,688] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt [default3]:[2022-09-04 15:32:55,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt. [default3]:[2022-09-04 15:32:55,680] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt [default5]:[2022-09-04 15:32:55,652] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt. [default5]:[2022-09-04 15:32:55,652] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt [default4]:[2022-09-04 15:32:55,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt. [default4]:[2022-09-04 15:32:55,678] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt [default6]:[2022-09-04 15:32:55,652] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt. [default6]:[2022-09-04 15:32:55,652] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt [default1]:[2022-09-04 15:32:55,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt. [default1]:[2022-09-04 15:32:55,704] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt [default4]:[2022-09-04 15:32:55,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt. [default4]:[2022-09-04 15:32:55,681] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt [default3]:[2022-09-04 15:32:55,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt. [default3]:[2022-09-04 15:32:55,746] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt [default6]:[2022-09-04 15:32:55,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt. [default6]:[2022-09-04 15:32:55,719] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt [default1]:[2022-09-04 15:32:55,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt. [default1]:[2022-09-04 15:32:55,717] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt [default0]:[2022-09-04 15:32:55,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt. [default0]:[2022-09-04 15:32:55,714] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt [default2]:[2022-09-04 15:32:55,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt. [default2]:[2022-09-04 15:32:55,768] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt [default0]:[2022-09-04 15:32:55,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt. [default0]:[2022-09-04 15:32:55,794] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt [default2]:[2022-09-04 15:32:55,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt. [default2]:[2022-09-04 15:32:55,825] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt [default7]:[2022-09-04 15:32:55,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt. [default7]:[2022-09-04 15:32:55,789] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt [default0]:[2022-09-04 15:32:55,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt. [default0]:[2022-09-04 15:32:55,811] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt [default6]:[2022-09-04 15:32:55,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt. [default6]:[2022-09-04 15:32:55,836] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt [default3]:[2022-09-04 15:32:55,776] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt. [default3]:[2022-09-04 15:32:55,776] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt [default5]:[2022-09-04 15:32:55,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt. [default5]:[2022-09-04 15:32:55,841] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt [default7]:[2022-09-04 15:32:55,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt. [default7]:[2022-09-04 15:32:55,829] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt [default0]:[2022-09-04 15:32:55,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt. [default0]:[2022-09-04 15:32:55,863] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt [default7]:[2022-09-04 15:32:55,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt. [default7]:[2022-09-04 15:32:55,871] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt [default5]:[2022-09-04 15:32:55,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt. [default5]:[2022-09-04 15:32:55,821] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt [default0]:[2022-09-04 15:32:55,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt. [default0]:[2022-09-04 15:32:55,875] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt [default7]:[2022-09-04 15:32:55,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt. [default7]:[2022-09-04 15:32:55,850] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt [default6]:[2022-09-04 15:32:55,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt. [default6]:[2022-09-04 15:32:55,884] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt [default4]:[2022-09-04 15:32:55,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt. [default4]:[2022-09-04 15:32:55,816] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt [default1]:[2022-09-04 15:32:55,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt. [default1]:[2022-09-04 15:32:55,841] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt [default0]:[2022-09-04 15:32:55,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt. [default0]:[2022-09-04 15:32:55,896] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt [default1]:[2022-09-04 15:32:55,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt. [default1]:[2022-09-04 15:32:55,976] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt [default6]:[2022-09-04 15:32:55,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt. [default6]:[2022-09-04 15:32:55,982] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt [default3]:[2022-09-04 15:32:55,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt. [default3]:[2022-09-04 15:32:55,906] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt [default1]:[2022-09-04 15:32:56,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt. [default1]:[2022-09-04 15:32:56,001] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt [default3]:[2022-09-04 15:32:55,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt. [default3]:[2022-09-04 15:32:55,945] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt [default0]:[2022-09-04 15:32:56,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt. [default0]:[2022-09-04 15:32:56,033] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt [default1]:[2022-09-04 15:32:55,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt. [default1]:[2022-09-04 15:32:55,954] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt [default3]:[2022-09-04 15:32:56,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt. [default3]:[2022-09-04 15:32:56,019] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt [default0]:[2022-09-04 15:32:56,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt. [default0]:[2022-09-04 15:32:56,070] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt [default5]:[2022-09-04 15:32:56,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt. [default5]:[2022-09-04 15:32:56,051] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt [default3]:[2022-09-04 15:32:56,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt. [default3]:[2022-09-04 15:32:56,048] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt [default3]:[2022-09-04 15:32:56,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt. [default3]:[2022-09-04 15:32:56,011] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt [default4]:[2022-09-04 15:32:56,059] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt. [default4]:[2022-09-04 15:32:56,059] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt [default5]:[2022-09-04 15:32:56,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt. [default5]:[2022-09-04 15:32:56,044] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt [default2]:[2022-09-04 15:32:56,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt. [default2]:[2022-09-04 15:32:56,108] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt [default7]:[2022-09-04 15:32:56,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt. [default7]:[2022-09-04 15:32:56,042] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt [default6]:[2022-09-04 15:32:56,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt. [default6]:[2022-09-04 15:32:56,119] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt [default4]:[2022-09-04 15:32:56,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt. [default4]:[2022-09-04 15:32:56,048] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt [default2]:[2022-09-04 15:32:56,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt. [default2]:[2022-09-04 15:32:56,110] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt [default6]:[2022-09-04 15:32:56,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt. [default6]:[2022-09-04 15:32:56,169] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt [default6]:[2022-09-04 15:32:56,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt. [default6]:[2022-09-04 15:32:56,164] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt [default7]:[2022-09-04 15:32:56,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt. [default7]:[2022-09-04 15:32:56,145] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt [default3]:[2022-09-04 15:32:56,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt. [default3]:[2022-09-04 15:32:56,208] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt [default0]:[2022-09-04 15:32:56,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt. [default0]:[2022-09-04 15:32:56,142] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt [default4]:[2022-09-04 15:32:56,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt. [default4]:[2022-09-04 15:32:56,183] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt [default4]:[2022-09-04 15:32:56,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt. [default4]:[2022-09-04 15:32:56,239] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt [default6]:[2022-09-04 15:32:56,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt. [default6]:[2022-09-04 15:32:56,195] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt [default1]:[2022-09-04 15:32:56,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt. [default1]:[2022-09-04 15:32:56,240] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt [default7]:[2022-09-04 15:32:56,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt. [default7]:[2022-09-04 15:32:56,207] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt [default1]:[2022-09-04 15:32:56,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt. [default1]:[2022-09-04 15:32:56,297] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt [default6]:[2022-09-04 15:32:56,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt. [default6]:[2022-09-04 15:32:56,298] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt [default5]:[2022-09-04 15:32:56,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt. [default5]:[2022-09-04 15:32:56,276] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt [default7]:[2022-09-04 15:32:56,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt. [default7]:[2022-09-04 15:32:56,285] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt [default5]:[2022-09-04 15:32:56,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt. [default5]:[2022-09-04 15:32:56,244] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt [default7]:[2022-09-04 15:32:56,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt. [default7]:[2022-09-04 15:32:56,257] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt [default5]:[2022-09-04 15:32:56,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt. [default5]:[2022-09-04 15:32:56,375] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt [default4]:[2022-09-04 15:32:56,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt. [default4]:[2022-09-04 15:32:56,306] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt [default1]:[2022-09-04 15:32:56,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt. [default1]:[2022-09-04 15:32:56,356] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt [default4]:[2022-09-04 15:32:56,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt. [default4]:[2022-09-04 15:32:56,349] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt [default1]:[2022-09-04 15:32:56,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt. [default1]:[2022-09-04 15:32:56,392] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt [default2]:[2022-09-04 15:32:56,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt. [default2]:[2022-09-04 15:32:56,359] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt [default2]:[2022-09-04 15:32:56,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt. [default2]:[2022-09-04 15:32:56,345] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt [default5]:[2022-09-04 15:32:56,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt. [default5]:[2022-09-04 15:32:56,360] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt [default3]:[2022-09-04 15:32:56,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt. [default3]:[2022-09-04 15:32:56,429] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt [default7]:[2022-09-04 15:32:56,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt. [default7]:[2022-09-04 15:32:56,408] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt [default0]:[2022-09-04 15:32:56,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt. [default0]:[2022-09-04 15:32:56,376] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt [default2]:[2022-09-04 15:32:56,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt. [default2]:[2022-09-04 15:32:56,366] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt [default5]:[2022-09-04 15:32:56,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt. [default5]:[2022-09-04 15:32:56,397] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt [default6]:[2022-09-04 15:32:56,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt. [default6]:[2022-09-04 15:32:56,450] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt [default6]:[2022-09-04 15:32:56,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt. [default6]:[2022-09-04 15:32:56,445] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt [default3]:[2022-09-04 15:32:56,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt. [default3]:[2022-09-04 15:32:56,513] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt [default2]:[2022-09-04 15:32:56,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt. [default2]:[2022-09-04 15:32:56,466] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt [default4]:[2022-09-04 15:32:56,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt. [default4]:[2022-09-04 15:32:56,485] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt [default7]:[2022-09-04 15:32:56,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt. [default7]:[2022-09-04 15:32:56,481] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt [default4]:[2022-09-04 15:32:56,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt. [default4]:[2022-09-04 15:32:56,512] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt [default5]:[2022-09-04 15:32:56,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt. [default5]:[2022-09-04 15:32:56,465] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt [default6]:[2022-09-04 15:32:56,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt. [default6]:[2022-09-04 15:32:56,502] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt [default7]:[2022-09-04 15:32:56,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt. [default7]:[2022-09-04 15:32:56,526] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt [default1]:[2022-09-04 15:32:56,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt. [default1]:[2022-09-04 15:32:56,576] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt [default7]:[2022-09-04 15:32:56,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt. [default7]:[2022-09-04 15:32:56,542] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt [default4]:[2022-09-04 15:32:56,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt. [default4]:[2022-09-04 15:32:56,604] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt [default1]:[2022-09-04 15:32:56,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt. [default1]:[2022-09-04 15:32:56,556] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt [default5]:[2022-09-04 15:32:56,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt. [default5]:[2022-09-04 15:32:56,610] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt [default3]:[2022-09-04 15:32:56,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt. [default3]:[2022-09-04 15:32:56,640] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt [default2]:[2022-09-04 15:32:56,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt. [default2]:[2022-09-04 15:32:56,683] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt [default5]:[2022-09-04 15:32:56,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt. [default5]:[2022-09-04 15:32:56,669] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt [default6]:[2022-09-04 15:32:56,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt. [default6]:[2022-09-04 15:32:56,747] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt [default4]:[2022-09-04 15:32:56,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt. [default4]:[2022-09-04 15:32:56,728] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt [default2]:[2022-09-04 15:32:56,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt. [default2]:[2022-09-04 15:32:56,728] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt [default6]:[2022-09-04 15:32:56,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt. [default6]:[2022-09-04 15:32:56,717] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt [default7]:[2022-09-04 15:32:56,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt. [default7]:[2022-09-04 15:32:56,716] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt [default5]:[2022-09-04 15:32:56,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt. [default5]:[2022-09-04 15:32:56,792] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt [default5]:[2022-09-04 15:32:56,759] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt. [default5]:[2022-09-04 15:32:56,759] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt [default1]:[2022-09-04 15:32:56,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt. [default1]:[2022-09-04 15:32:56,783] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt [default7]:[2022-09-04 15:32:56,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt. [default7]:[2022-09-04 15:32:56,861] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt [default3]:[2022-09-04 15:32:56,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt. [default3]:[2022-09-04 15:32:56,868] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt [default5]:[2022-09-04 15:32:56,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt. [default5]:[2022-09-04 15:32:56,898] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt [default7]:[2022-09-04 15:32:56,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt. [default7]:[2022-09-04 15:32:56,853] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt [default6]:[2022-09-04 15:32:56,903] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt. [default6]:[2022-09-04 15:32:56,904] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt [default4]:[2022-09-04 15:32:56,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt. [default4]:[2022-09-04 15:32:56,971] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt [default1]:[2022-09-04 15:32:56,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt. [default1]:[2022-09-04 15:32:56,911] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt [default7]:[2022-09-04 15:32:57,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt. [default7]:[2022-09-04 15:32:57,039] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt [default6]:[2022-09-04 15:32:57,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt. [default6]:[2022-09-04 15:32:57,037] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt [default3]:[2022-09-04 15:32:57,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt. [default3]:[2022-09-04 15:32:57,005] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt [default7]:[2022-09-04 15:32:57,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt. [default7]:[2022-09-04 15:32:57,049] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt [default2]:[2022-09-04 15:32:57,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt. [default2]:[2022-09-04 15:32:57,064] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt [default2]:[2022-09-04 15:32:57,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt. [default2]:[2022-09-04 15:32:57,050] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt [default0]:[2022-09-04 15:32:57,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt. [default0]:[2022-09-04 15:32:57,074] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt [default2]:[2022-09-04 15:32:57,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt. [default2]:[2022-09-04 15:32:57,131] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt [default3]:[2022-09-04 15:32:57,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt. [default3]:[2022-09-04 15:32:57,110] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt [default7]:[2022-09-04 15:32:57,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt. [default7]:[2022-09-04 15:32:57,158] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt [default5]:[2022-09-04 15:32:57,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt. [default5]:[2022-09-04 15:32:57,136] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt [default4]:[2022-09-04 15:32:57,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt. [default4]:[2022-09-04 15:32:57,201] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt [default0]:[2022-09-04 15:32:57,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt. [default0]:[2022-09-04 15:32:57,248] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt [default3]:[2022-09-04 15:32:57,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt. [default3]:[2022-09-04 15:32:57,233] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt [default1]:[2022-09-04 15:32:57,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt. [default1]:[2022-09-04 15:32:57,239] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt [default0]:[2022-09-04 15:32:57,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt. [default0]:[2022-09-04 15:32:57,251] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt [default6]:[2022-09-04 15:32:57,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt. [default6]:[2022-09-04 15:32:57,351] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt [default2]:[2022-09-04 15:32:57,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt. [default2]:[2022-09-04 15:32:57,337] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt [default5]:[2022-09-04 15:32:57,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt. [default5]:[2022-09-04 15:32:57,355] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt [default2]:[2022-09-04 15:32:57,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt. [default2]:[2022-09-04 15:32:57,447] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt [default4]:[2022-09-04 15:32:57,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt. [default4]:[2022-09-04 15:32:57,541] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt [default2]:[2022-09-04 15:32:57,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt. [default2]:[2022-09-04 15:32:57,542] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt [default5]:[2022-09-04 15:32:57,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt. [default5]:[2022-09-04 15:32:57,598] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt [default2]:[2022-09-04 15:32:57,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt. [default2]:[2022-09-04 15:32:57,557] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt [default3]:[2022-09-04 15:32:57,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt. [default3]:[2022-09-04 15:32:57,555] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt [default4]:[2022-09-04 15:32:57,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt. [default4]:[2022-09-04 15:32:57,629] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt [default7]:[2022-09-04 15:32:57,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt. [default7]:[2022-09-04 15:32:57,588] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt [default4]:[2022-09-04 15:32:57,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt. [default4]:[2022-09-04 15:32:57,655] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt [default7]:[2022-09-04 15:32:57,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt. [default7]:[2022-09-04 15:32:57,682] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt [default0]:[2022-09-04 15:32:57,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt. [default0]:[2022-09-04 15:32:57,807] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt [default3]:[2022-09-04 15:32:57,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt. [default3]:[2022-09-04 15:32:57,790] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt [default6]:[2022-09-04 15:32:57,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt. [default6]:[2022-09-04 15:32:57,803] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt [default4]:[2022-09-04 15:32:57,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt. [default4]:[2022-09-04 15:32:57,961] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt [default1]:[2022-09-04 15:32:57,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt. [default1]:[2022-09-04 15:32:57,884] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt [default1]:[2022-09-04 15:32:57,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt. [default1]:[2022-09-04 15:32:57,984] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt [default0]:[2022-09-04 15:32:58,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt. [default0]:[2022-09-04 15:32:58,002] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt [default7]:[2022-09-04 15:32:57,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt. [default7]:[2022-09-04 15:32:57,987] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt [default7]:[2022-09-04 15:32:58,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt. [default7]:[2022-09-04 15:32:58,268] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt [default6]:[2022-09-04 15:32:58,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt. [default6]:[2022-09-04 15:32:58,366] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt [default0]:[2022-09-04 15:32:58,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt. [default0]:[2022-09-04 15:32:58,407] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt [default5]:[2022-09-04 15:32:58,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt. [default5]:[2022-09-04 15:32:58,385] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt [default4]:[2022-09-04 15:32:58,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt. [default4]:[2022-09-04 15:32:58,503] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt [default1]:[2022-09-04 15:32:58,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt. [default1]:[2022-09-04 15:32:58,453] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt [default1]:[2022-09-04 15:32:58,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt. [default1]:[2022-09-04 15:32:58,500] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt [default5]:[2022-09-04 15:32:58,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt. [default5]:[2022-09-04 15:32:58,947] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt [default3]:[2022-09-04 15:32:58,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt. [default3]:[2022-09-04 15:32:58,980] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt [default0]:[2022-09-04 15:32:59,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt. [default0]:[2022-09-04 15:32:59,144] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt [default5]:[2022-09-04 15:32:59,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt. [default5]:[2022-09-04 15:32:59,139] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt [default1]:[2022-09-04 15:32:59,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt. [default1]:[2022-09-04 15:32:59,265] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt [default0]:[2022-09-04 15:32:59,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt. [default0]:[2022-09-04 15:32:59,339] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt [default4]:[2022-09-04 15:32:59,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt. [default4]:[2022-09-04 15:32:59,412] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt [default1]:[2022-09-04 15:32:59,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt. [default1]:[2022-09-04 15:32:59,457] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt [default3]:[2022-09-04 15:32:59,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt. [default3]:[2022-09-04 15:32:59,504] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt [default1]:[2022-09-04 15:32:59,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt. [default1]:[2022-09-04 15:32:59,555] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt [default3]:[2022-09-04 15:32:59,653] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt. [default3]:[2022-09-04 15:32:59,653] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt [default3]:[2022-09-04 15:32:59,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt. [default3]:[2022-09-04 15:32:59,813] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt [default0]:[2022-09-04 15:33:00,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt. [default0]:[2022-09-04 15:33:00,484] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt [default6]:[2022-09-04 15:33:00,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt. [default6]:[2022-09-04 15:33:00,486] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt [default6]:[2022-09-04 15:33:00,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt. [default6]:[2022-09-04 15:33:00,513] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt [default6]:[2022-09-04 15:33:00,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt. [default6]:[2022-09-04 15:33:00,557] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt [default7]:[2022-09-04 15:33:00,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt. [default7]:[2022-09-04 15:33:00,663] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt [default2]:[2022-09-04 15:33:00,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt. [default2]:[2022-09-04 15:33:00,742] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt [default1]:[2022-09-04 15:33:00,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt. [default1]:[2022-09-04 15:33:00,762] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt [default2]:[2022-09-04 15:33:01,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt. [default2]:[2022-09-04 15:33:01,143] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt [default5]:[2022-09-04 15:33:01,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt. [default5]:[2022-09-04 15:33:01,235] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt [default4]:[2022-09-04 15:33:01,370] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt. [default4]:[2022-09-04 15:33:01,370] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt [default7]:[2022-09-04 15:33:01,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt. [default7]:[2022-09-04 15:33:01,431] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt [default7]:[2022-09-04 15:33:01,458] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt. [default7]:[2022-09-04 15:33:01,458] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt [default1]:[2022-09-04 15:33:01,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt. [default1]:[2022-09-04 15:33:01,465] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt [default3]:[2022-09-04 15:33:01,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt. [default3]:[2022-09-04 15:33:01,529] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt [default4]:[2022-09-04 15:33:01,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt. [default4]:[2022-09-04 15:33:01,598] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt [default3]:[2022-09-04 15:33:01,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt. [default3]:[2022-09-04 15:33:01,787] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt [default2]:[2022-09-04 15:33:01,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt. [default2]:[2022-09-04 15:33:01,981] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt [default1]:[2022-09-04 15:33:02,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt. [default1]:[2022-09-04 15:33:02,088] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt [default2]:[2022-09-04 15:33:02,077] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt. [default2]:[2022-09-04 15:33:02,077] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt [default3]:[2022-09-04 15:33:02,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt. [default3]:[2022-09-04 15:33:02,160] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt [default2]:[2022-09-04 15:33:02,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt. [default2]:[2022-09-04 15:33:02,221] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt [default4]:[2022-09-04 15:33:02,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt. [default4]:[2022-09-04 15:33:02,184] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt [default0]:[2022-09-04 15:33:02,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt. [default0]:[2022-09-04 15:33:02,311] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt [default1]:[2022-09-04 15:33:02,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt. [default1]:[2022-09-04 15:33:02,613] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt [default0]:[2022-09-04 15:33:02,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt. [default0]:[2022-09-04 15:33:02,547] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt [default0]:[2022-09-04 15:33:02,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt. [default0]:[2022-09-04 15:33:02,717] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt [default5]:[2022-09-04 15:33:02,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt. [default5]:[2022-09-04 15:33:02,844] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt [default7]:[2022-09-04 15:33:02,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt. [default7]:[2022-09-04 15:33:02,849] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt [default4]:[2022-09-04 15:33:03,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt. [default4]:[2022-09-04 15:33:03,107] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt [default2]:[2022-09-04 15:33:03,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt. [default2]:[2022-09-04 15:33:03,267] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt [default0]:[2022-09-04 15:33:03,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt. [default0]:[2022-09-04 15:33:03,303] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt [default5]:[2022-09-04 15:33:03,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt. [default5]:[2022-09-04 15:33:03,361] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt [default5]:[2022-09-04 15:33:03,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt. [default5]:[2022-09-04 15:33:03,552] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt [default5]:[2022-09-04 15:33:03,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt. [default5]:[2022-09-04 15:33:03,695] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt [default7]:[2022-09-04 15:33:03,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt. [default7]:[2022-09-04 15:33:03,711] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt [default2]:[2022-09-04 15:33:03,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt. [default2]:[2022-09-04 15:33:03,790] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt [default4]:[2022-09-04 15:33:03,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt. [default4]:[2022-09-04 15:33:03,841] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt [default0]:[2022-09-04 15:33:03,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt. [default0]:[2022-09-04 15:33:03,895] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt [default3]:[2022-09-04 15:33:04,085] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt. [default3]:[2022-09-04 15:33:04,085] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt [default3]:[2022-09-04 15:33:04,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt. [default3]:[2022-09-04 15:33:04,101] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt [default1]:[2022-09-04 15:33:04,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt. [default1]:[2022-09-04 15:33:04,150] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt [default6]:[2022-09-04 15:33:04,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt. [default6]:[2022-09-04 15:33:04,122] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt [default6]:[2022-09-04 15:33:04,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt. [default6]:[2022-09-04 15:33:04,177] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt [default5]:[2022-09-04 15:33:04,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt. [default5]:[2022-09-04 15:33:04,259] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt [default4]:[2022-09-04 15:33:04,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt. [default4]:[2022-09-04 15:33:04,238] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt [default7]:[2022-09-04 15:33:04,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt. [default7]:[2022-09-04 15:33:04,304] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt [default2]:[2022-09-04 15:33:04,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt. [default2]:[2022-09-04 15:33:04,337] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt [default6]:[2022-09-04 15:33:04,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt. [default6]:[2022-09-04 15:33:04,295] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt [default7]:[2022-09-04 15:33:04,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt. [default7]:[2022-09-04 15:33:04,494] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt [default5]:[2022-09-04 15:33:04,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt. [default5]:[2022-09-04 15:33:04,464] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt [default4]:[2022-09-04 15:33:04,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt. [default4]:[2022-09-04 15:33:04,557] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt [default6]:[2022-09-04 15:33:04,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt. [default6]:[2022-09-04 15:33:04,612] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt [default6]:[2022-09-04 15:33:04,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt. [default6]:[2022-09-04 15:33:04,576] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt [default4]:[2022-09-04 15:33:04,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt. [default4]:[2022-09-04 15:33:04,669] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt [default1]:[2022-09-04 15:33:04,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt. [default1]:[2022-09-04 15:33:04,704] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt [default0]:[2022-09-04 15:33:04,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt. [default0]:[2022-09-04 15:33:04,779] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt [default2]:[2022-09-04 15:33:05,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. [default2]:[2022-09-04 15:33:05,436] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt [default3]:[2022-09-04 15:33:05,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. [default3]:[2022-09-04 15:33:05,878] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt [default7]:[2022-09-04 15:33:06,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt. [default7]:[2022-09-04 15:33:06,225] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt [default6]:[2022-09-04 15:33:06,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt. [default6]:[2022-09-04 15:33:06,327] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt [default1]:[2022-09-04 15:33:06,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. [default1]:[2022-09-04 15:33:06,934] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt [default0]:[2022-09-04 15:33:07,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [default0]:[2022-09-04 15:33:07,753] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [default7]:[2022-09-04 15:33:08,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt. [default7]:[2022-09-04 15:33:08,054] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt [default6]:[2022-09-04 15:33:08,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt. [default6]:[2022-09-04 15:33:08,120] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt [default2]:[2022-09-04 15:33:08,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt. [default2]:[2022-09-04 15:33:08,175] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt [default7]:[2022-09-04 15:33:08,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt. [default7]:[2022-09-04 15:33:08,279] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt [default1]:[2022-09-04 15:33:08,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt. [default1]:[2022-09-04 15:33:08,309] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt [default3]:[2022-09-04 15:33:08,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt. [default3]:[2022-09-04 15:33:08,385] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt [default0]:[2022-09-04 15:33:08,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt. [default0]:[2022-09-04 15:33:08,353] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt [default5]:[2022-09-04 15:33:08,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt. [default5]:[2022-09-04 15:33:08,588] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt [default4]:[2022-09-04 15:33:08,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt. [default4]:[2022-09-04 15:33:08,883] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt [default5]:[2022-09-04 15:33:09,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt. [default5]:[2022-09-04 15:33:09,053] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:time (ms) | save-checkpoint: 27216.61 [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]: successfully saved checkpoint at iteration 498 to /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt. [default4]:[2022-09-04 15:33:09,182] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step498/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default3]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default4]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default0]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default2]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default6]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default5]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default1]:[2022-09-04 15:33:09,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step498 is ready now! [default7]: iteration 499/ 3100 | consumed samples: 1021952 | consumed tokens: 2092957696 | elapsed time per iteration (s): 168.60 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.522985E-01 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 12.147 | TFLOPs: 124.00 | [default7]: iteration 500/ 3100 | consumed samples: 1024000 | consumed tokens: 2097152000 | elapsed time per iteration (s): 141.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.491523E-01 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.479 | TFLOPs: 147.81 | [default7]:---------------------------------------------------------------------------------------------------------- [default7]:validation_pretraining loss at iteration 500 | lm loss value: 2.496777E+00 | lm loss PPL: 1.214330E+01 | [default7]:---------------------------------------------------------------------------------------------------------- [default7]:----------------------------------------------------------------------------------------- [default7]:valid loss at iteration 500 | lm loss value: 1.263814E+00 | lm loss PPL: 3.538892E+00 | [default7]:----------------------------------------------------------------------------------------- [default7]: iteration 501/ 3100 | consumed samples: 1026048 | consumed tokens: 2101346304 | elapsed time per iteration (s): 228.93 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.477938E-01 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 8.946 | TFLOPs: 91.32 | [default7]: iteration 502/ 3100 | consumed samples: 1028096 | consumed tokens: 2105540608 | elapsed time per iteration (s): 142.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.447690E-01 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.383 | TFLOPs: 146.82 | [default7]: iteration 503/ 3100 | consumed samples: 1030144 | consumed tokens: 2109734912 | elapsed time per iteration (s): 141.54 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.458139E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.469 | TFLOPs: 147.71 | [default7]: iteration 504/ 3100 | consumed samples: 1032192 | consumed tokens: 2113929216 | elapsed time per iteration (s): 141.49 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.526582E-01 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.475 | TFLOPs: 147.77 | [default7]: iteration 505/ 3100 | consumed samples: 1034240 | consumed tokens: 2118123520 | elapsed time per iteration (s): 142.72 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.483792E-01 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.350 | TFLOPs: 146.49 | [default7]: iteration 506/ 3100 | consumed samples: 1036288 | consumed tokens: 2122317824 | elapsed time per iteration (s): 141.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.521550E-01 | grad norm: 0.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 507/ 3100 | consumed samples: 1038336 | consumed tokens: 2126512128 | elapsed time per iteration (s): 142.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.441926E-01 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.382 | TFLOPs: 146.82 | [default7]: iteration 508/ 3100 | consumed samples: 1040384 | consumed tokens: 2130706432 | elapsed time per iteration (s): 142.05 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.443005E-01 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.417 | TFLOPs: 147.18 | [default7]: iteration 509/ 3100 | consumed samples: 1042432 | consumed tokens: 2134900736 | elapsed time per iteration (s): 141.95 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.545366E-01 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.428 | TFLOPs: 147.28 | [default7]: iteration 510/ 3100 | consumed samples: 1044480 | consumed tokens: 2139095040 | elapsed time per iteration (s): 143.05 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.574253E-01 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.317 | TFLOPs: 146.15 | [default7]: iteration 511/ 3100 | consumed samples: 1046528 | consumed tokens: 2143289344 | elapsed time per iteration (s): 142.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.508285E-01 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.387 | TFLOPs: 146.87 | [default7]: iteration 512/ 3100 | consumed samples: 1048576 | consumed tokens: 2147483648 | elapsed time per iteration (s): 142.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.552221E-01 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.379 | TFLOPs: 146.78 | [default7]: iteration 513/ 3100 | consumed samples: 1050624 | consumed tokens: 2151677952 | elapsed time per iteration (s): 141.49 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.577213E-01 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.475 | TFLOPs: 147.77 | [default7]: iteration 514/ 3100 | consumed samples: 1052672 | consumed tokens: 2155872256 | elapsed time per iteration (s): 142.21 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.433746E-01 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.402 | TFLOPs: 147.02 | [default7]: iteration 515/ 3100 | consumed samples: 1054720 | consumed tokens: 2160066560 | elapsed time per iteration (s): 141.30 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.413817E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.494 | TFLOPs: 147.96 | [default7]: iteration 516/ 3100 | consumed samples: 1056768 | consumed tokens: 2164260864 | elapsed time per iteration (s): 142.59 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.523158E-01 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.363 | TFLOPs: 146.62 | [default7]: iteration 517/ 3100 | consumed samples: 1058816 | consumed tokens: 2168455168 | elapsed time per iteration (s): 142.12 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.418939E-01 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.410 | TFLOPs: 147.11 | [default7]: iteration 518/ 3100 | consumed samples: 1060864 | consumed tokens: 2172649472 | elapsed time per iteration (s): 141.03 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.509768E-01 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.522 | TFLOPs: 148.24 | [default7]: iteration 519/ 3100 | consumed samples: 1062912 | consumed tokens: 2176843776 | elapsed time per iteration (s): 141.50 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.483197E-01 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.473 | TFLOPs: 147.75 | [default7]: iteration 520/ 3100 | consumed samples: 1064960 | consumed tokens: 2181038080 | elapsed time per iteration (s): 142.03 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.427705E-01 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.419 | TFLOPs: 147.20 | [default7]: iteration 521/ 3100 | consumed samples: 1067008 | consumed tokens: 2185232384 | elapsed time per iteration (s): 142.98 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.507918E-01 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.324 | TFLOPs: 146.22 | [default7]: iteration 522/ 3100 | consumed samples: 1069056 | consumed tokens: 2189426688 | elapsed time per iteration (s): 142.06 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.391470E-01 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.416 | TFLOPs: 147.17 | [default7]: iteration 523/ 3100 | consumed samples: 1071104 | consumed tokens: 2193620992 | elapsed time per iteration (s): 142.44 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.472063E-01 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.378 | TFLOPs: 146.78 | [default7]: iteration 524/ 3100 | consumed samples: 1073152 | consumed tokens: 2197815296 | elapsed time per iteration (s): 142.78 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.404893E-01 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.344 | TFLOPs: 146.43 | [default7]: iteration 525/ 3100 | consumed samples: 1075200 | consumed tokens: 2202009600 | elapsed time per iteration (s): 141.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.427937E-01 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.487 | TFLOPs: 147.89 | [default7]: iteration 526/ 3100 | consumed samples: 1077248 | consumed tokens: 2206203904 | elapsed time per iteration (s): 143.24 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.486887E-01 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.298 | TFLOPs: 145.96 | [default7]: iteration 527/ 3100 | consumed samples: 1079296 | consumed tokens: 2210398208 | elapsed time per iteration (s): 142.93 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.249582E-01 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.328 | TFLOPs: 146.27 | [default7]: iteration 528/ 3100 | consumed samples: 1081344 | consumed tokens: 2214592512 | elapsed time per iteration (s): 141.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.390198E-01 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.487 | TFLOPs: 147.88 | [default7]: iteration 529/ 3100 | consumed samples: 1083392 | consumed tokens: 2218786816 | elapsed time per iteration (s): 142.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.353882E-01 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.333 | TFLOPs: 146.32 | [default7]: iteration 530/ 3100 | consumed samples: 1085440 | consumed tokens: 2222981120 | elapsed time per iteration (s): 145.10 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.380201E-01 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.114 | TFLOPs: 144.09 | [default7]: iteration 531/ 3100 | consumed samples: 1087488 | consumed tokens: 2227175424 | elapsed time per iteration (s): 142.24 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.407371E-01 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.398 | TFLOPs: 146.98 | [default7]: iteration 532/ 3100 | consumed samples: 1089536 | consumed tokens: 2231369728 | elapsed time per iteration (s): 144.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.363718E-01 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.189 | TFLOPs: 144.85 | [default7]: iteration 533/ 3100 | consumed samples: 1091584 | consumed tokens: 2235564032 | elapsed time per iteration (s): 141.57 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.335671E-01 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.466 | TFLOPs: 147.67 | [default7]: iteration 534/ 3100 | consumed samples: 1093632 | consumed tokens: 2239758336 | elapsed time per iteration (s): 146.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.337464E-01 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.005 | TFLOPs: 142.97 | [default7]: iteration 535/ 3100 | consumed samples: 1095680 | consumed tokens: 2243952640 | elapsed time per iteration (s): 141.70 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.380771E-01 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.453 | TFLOPs: 147.55 | [default7]: iteration 536/ 3100 | consumed samples: 1097728 | consumed tokens: 2248146944 | elapsed time per iteration (s): 141.98 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.443495E-01 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.425 | TFLOPs: 147.25 | [default7]: iteration 537/ 3100 | consumed samples: 1099776 | consumed tokens: 2252341248 | elapsed time per iteration (s): 144.06 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.478887E-01 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.216 | TFLOPs: 145.12 | [default7]: iteration 538/ 3100 | consumed samples: 1101824 | consumed tokens: 2256535552 | elapsed time per iteration (s): 144.00 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.366228E-01 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.222 | TFLOPs: 145.18 | [default7]: iteration 539/ 3100 | consumed samples: 1103872 | consumed tokens: 2260729856 | elapsed time per iteration (s): 142.48 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.423724E-01 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.374 | TFLOPs: 146.74 | [default7]: iteration 540/ 3100 | consumed samples: 1105920 | consumed tokens: 2264924160 | elapsed time per iteration (s): 141.47 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.389771E-01 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.477 | TFLOPs: 147.78 | [default7]: iteration 541/ 3100 | consumed samples: 1107968 | consumed tokens: 2269118464 | elapsed time per iteration (s): 142.04 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.370302E-01 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.419 | TFLOPs: 147.19 | [default7]: iteration 542/ 3100 | consumed samples: 1110016 | consumed tokens: 2273312768 | elapsed time per iteration (s): 141.76 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.355914E-01 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.447 | TFLOPs: 147.49 | [default7]: iteration 543/ 3100 | consumed samples: 1112064 | consumed tokens: 2277507072 | elapsed time per iteration (s): 142.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.399578E-01 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.395 | TFLOPs: 146.96 | [default7]: iteration 544/ 3100 | consumed samples: 1114112 | consumed tokens: 2281701376 | elapsed time per iteration (s): 143.03 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.330737E-01 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.318 | TFLOPs: 146.17 | [default7]: iteration 545/ 3100 | consumed samples: 1116160 | consumed tokens: 2285895680 | elapsed time per iteration (s): 142.86 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.430470E-01 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.336 | TFLOPs: 146.35 | [default7]: iteration 546/ 3100 | consumed samples: 1118208 | consumed tokens: 2290089984 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.287038E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.477 | TFLOPs: 147.79 | [default7]: iteration 547/ 3100 | consumed samples: 1120256 | consumed tokens: 2294284288 | elapsed time per iteration (s): 141.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.398586E-01 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.481 | TFLOPs: 147.83 | [default7]: iteration 548/ 3100 | consumed samples: 1122304 | consumed tokens: 2298478592 | elapsed time per iteration (s): 141.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.369975E-01 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.434 | TFLOPs: 147.35 | [default7]: iteration 549/ 3100 | consumed samples: 1124352 | consumed tokens: 2302672896 | elapsed time per iteration (s): 143.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.385526E-01 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.285 | TFLOPs: 145.83 | [default7]: iteration 550/ 3100 | consumed samples: 1126400 | consumed tokens: 2306867200 | elapsed time per iteration (s): 141.44 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.355174E-01 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.479 | TFLOPs: 147.81 | [default7]: iteration 551/ 3100 | consumed samples: 1128448 | consumed tokens: 2311061504 | elapsed time per iteration (s): 141.13 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.299059E-01 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.511 | TFLOPs: 148.13 | [default7]: iteration 552/ 3100 | consumed samples: 1130496 | consumed tokens: 2315255808 | elapsed time per iteration (s): 141.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.439530E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 553/ 3100 | consumed samples: 1132544 | consumed tokens: 2319450112 | elapsed time per iteration (s): 141.90 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.321048E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.433 | TFLOPs: 147.34 | [default7]: iteration 554/ 3100 | consumed samples: 1134592 | consumed tokens: 2323644416 | elapsed time per iteration (s): 141.85 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.287456E-01 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.438 | TFLOPs: 147.39 | [default7]: iteration 555/ 3100 | consumed samples: 1136640 | consumed tokens: 2327838720 | elapsed time per iteration (s): 142.52 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.390953E-01 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.370 | TFLOPs: 146.70 | [default7]: iteration 556/ 3100 | consumed samples: 1138688 | consumed tokens: 2332033024 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.338806E-01 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.473 | TFLOPs: 147.75 | [default7]: iteration 557/ 3100 | consumed samples: 1140736 | consumed tokens: 2336227328 | elapsed time per iteration (s): 143.07 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.333476E-01 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.315 | TFLOPs: 146.13 | [default7]: iteration 558/ 3100 | consumed samples: 1142784 | consumed tokens: 2340421632 | elapsed time per iteration (s): 141.55 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.333123E-01 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.468 | TFLOPs: 147.70 | [default7]: iteration 559/ 3100 | consumed samples: 1144832 | consumed tokens: 2344615936 | elapsed time per iteration (s): 142.50 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.298840E-01 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.372 | TFLOPs: 146.72 | [default7]: iteration 560/ 3100 | consumed samples: 1146880 | consumed tokens: 2348810240 | elapsed time per iteration (s): 142.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.295758E-01 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.387 | TFLOPs: 146.87 | [default7]: iteration 561/ 3100 | consumed samples: 1148928 | consumed tokens: 2353004544 | elapsed time per iteration (s): 141.36 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.348807E-01 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.487 | TFLOPs: 147.89 | [default7]: iteration 562/ 3100 | consumed samples: 1150976 | consumed tokens: 2357198848 | elapsed time per iteration (s): 141.52 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.368784E-01 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.73 | [default7]: iteration 563/ 3100 | consumed samples: 1153024 | consumed tokens: 2361393152 | elapsed time per iteration (s): 141.48 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.348085E-01 | grad norm: 1.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.476 | TFLOPs: 147.78 | [default7]: iteration 564/ 3100 | consumed samples: 1155072 | consumed tokens: 2365587456 | elapsed time per iteration (s): 141.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.333579E-01 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 565/ 3100 | consumed samples: 1157120 | consumed tokens: 2369781760 | elapsed time per iteration (s): 141.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.294895E-01 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 566/ 3100 | consumed samples: 1159168 | consumed tokens: 2373976064 | elapsed time per iteration (s): 141.82 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.421453E-01 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.441 | TFLOPs: 147.42 | [default7]: iteration 567/ 3100 | consumed samples: 1161216 | consumed tokens: 2378170368 | elapsed time per iteration (s): 141.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.291997E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 568/ 3100 | consumed samples: 1163264 | consumed tokens: 2382364672 | elapsed time per iteration (s): 142.26 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.344793E-01 | grad norm: 0.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.396 | TFLOPs: 146.96 | [default7]: iteration 569/ 3100 | consumed samples: 1165312 | consumed tokens: 2386558976 | elapsed time per iteration (s): 141.66 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.209872E-01 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.457 | TFLOPs: 147.58 | [default7]: iteration 570/ 3100 | consumed samples: 1167360 | consumed tokens: 2390753280 | elapsed time per iteration (s): 141.47 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.342874E-01 | grad norm: 0.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.477 | TFLOPs: 147.78 | [default7]: iteration 571/ 3100 | consumed samples: 1169408 | consumed tokens: 2394947584 | elapsed time per iteration (s): 141.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.284720E-01 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.489 | TFLOPs: 147.91 | [default7]: iteration 572/ 3100 | consumed samples: 1171456 | consumed tokens: 2399141888 | elapsed time per iteration (s): 141.73 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.225440E-01 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.450 | TFLOPs: 147.51 | [default7]: iteration 573/ 3100 | consumed samples: 1173504 | consumed tokens: 2403336192 | elapsed time per iteration (s): 141.60 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.378509E-01 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.463 | TFLOPs: 147.64 | [default7]: iteration 574/ 3100 | consumed samples: 1175552 | consumed tokens: 2407530496 | elapsed time per iteration (s): 141.54 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.440238E-01 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.470 | TFLOPs: 147.71 | [default7]: iteration 575/ 3100 | consumed samples: 1177600 | consumed tokens: 2411724800 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.259118E-01 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.484 | TFLOPs: 147.86 | [default7]: iteration 576/ 3100 | consumed samples: 1179648 | consumed tokens: 2415919104 | elapsed time per iteration (s): 141.52 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.391802E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.73 | [default7]: iteration 577/ 3100 | consumed samples: 1181696 | consumed tokens: 2420113408 | elapsed time per iteration (s): 141.52 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.254859E-01 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.471 | TFLOPs: 147.73 | [default7]: iteration 578/ 3100 | consumed samples: 1183744 | consumed tokens: 2424307712 | elapsed time per iteration (s): 141.34 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.249931E-01 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.490 | TFLOPs: 147.92 | [default7]: iteration 579/ 3100 | consumed samples: 1185792 | consumed tokens: 2428502016 | elapsed time per iteration (s): 141.57 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.326466E-01 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.466 | TFLOPs: 147.68 | [default7]: iteration 580/ 3100 | consumed samples: 1187840 | consumed tokens: 2432696320 | elapsed time per iteration (s): 141.64 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.330330E-01 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.460 | TFLOPs: 147.61 | [default7]: iteration 581/ 3100 | consumed samples: 1189888 | consumed tokens: 2436890624 | elapsed time per iteration (s): 142.57 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.324205E-01 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.365 | TFLOPs: 146.64 | [default7]: iteration 582/ 3100 | consumed samples: 1191936 | consumed tokens: 2441084928 | elapsed time per iteration (s): 141.66 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.292152E-01 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.457 | TFLOPs: 147.59 | [default7]: iteration 583/ 3100 | consumed samples: 1193984 | consumed tokens: 2445279232 | elapsed time per iteration (s): 141.26 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.177873E-01 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.498 | TFLOPs: 148.00 | [default7]: iteration 584/ 3100 | consumed samples: 1196032 | consumed tokens: 2449473536 | elapsed time per iteration (s): 141.70 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.221896E-01 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.453 | TFLOPs: 147.55 | [default7]: iteration 585/ 3100 | consumed samples: 1198080 | consumed tokens: 2453667840 | elapsed time per iteration (s): 142.17 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.279579E-01 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.405 | TFLOPs: 147.05 | [default7]: iteration 586/ 3100 | consumed samples: 1200128 | consumed tokens: 2457862144 | elapsed time per iteration (s): 141.08 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.208187E-01 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.517 | TFLOPs: 148.19 | [default7]: iteration 587/ 3100 | consumed samples: 1202176 | consumed tokens: 2462056448 | elapsed time per iteration (s): 143.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.264195E-01 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.284 | TFLOPs: 145.82 | [default7]: iteration 588/ 3100 | consumed samples: 1204224 | consumed tokens: 2466250752 | elapsed time per iteration (s): 142.79 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.167542E-01 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.343 | TFLOPs: 146.42 | [default7]: iteration 589/ 3100 | consumed samples: 1206272 | consumed tokens: 2470445056 | elapsed time per iteration (s): 142.47 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.296976E-01 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.375 | TFLOPs: 146.74 | [default7]: iteration 590/ 3100 | consumed samples: 1208320 | consumed tokens: 2474639360 | elapsed time per iteration (s): 142.71 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.235414E-01 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.351 | TFLOPs: 146.50 | [default7]: iteration 591/ 3100 | consumed samples: 1210368 | consumed tokens: 2478833664 | elapsed time per iteration (s): 141.80 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.244666E-01 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.443 | TFLOPs: 147.44 | [default7]: iteration 592/ 3100 | consumed samples: 1212416 | consumed tokens: 2483027968 | elapsed time per iteration (s): 142.08 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.296257E-01 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.414 | TFLOPs: 147.15 | [default7]: iteration 593/ 3100 | consumed samples: 1214464 | consumed tokens: 2487222272 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.213579E-01 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.481 | TFLOPs: 147.82 | [default7]: iteration 594/ 3100 | consumed samples: 1216512 | consumed tokens: 2491416576 | elapsed time per iteration (s): 141.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.273442E-01 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.479 | TFLOPs: 147.80 | [default7]: iteration 595/ 3100 | consumed samples: 1218560 | consumed tokens: 2495610880 | elapsed time per iteration (s): 142.72 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.154448E-01 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.350 | TFLOPs: 146.49 | [default7]: iteration 596/ 3100 | consumed samples: 1220608 | consumed tokens: 2499805184 | elapsed time per iteration (s): 142.83 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.250472E-01 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.338 | TFLOPs: 146.37 | [default7]: iteration 597/ 3100 | consumed samples: 1222656 | consumed tokens: 2503999488 | elapsed time per iteration (s): 141.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.146401E-01 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.486 | TFLOPs: 147.88 | [default7]: iteration 598/ 3100 | consumed samples: 1224704 | consumed tokens: 2508193792 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.283456E-01 | grad norm: 0.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.485 | TFLOPs: 147.87 | [default7]: iteration 599/ 3100 | consumed samples: 1226752 | consumed tokens: 2512388096 | elapsed time per iteration (s): 142.30 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.157793E-01 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.393 | TFLOPs: 146.93 | [default7]: iteration 600/ 3100 | consumed samples: 1228800 | consumed tokens: 2516582400 | elapsed time per iteration (s): 142.02 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.259478E-01 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.420 | TFLOPs: 147.21 | [default7]: iteration 601/ 3100 | consumed samples: 1230848 | consumed tokens: 2520776704 | elapsed time per iteration (s): 142.08 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.201351E-01 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.414 | TFLOPs: 147.14 | [default7]: iteration 602/ 3100 | consumed samples: 1232896 | consumed tokens: 2524971008 | elapsed time per iteration (s): 143.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.294594E-01 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.279 | TFLOPs: 145.77 | [default7]: iteration 603/ 3100 | consumed samples: 1234944 | consumed tokens: 2529165312 | elapsed time per iteration (s): 141.63 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.238253E-01 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.460 | TFLOPs: 147.61 | [default7]: iteration 604/ 3100 | consumed samples: 1236992 | consumed tokens: 2533359616 | elapsed time per iteration (s): 142.06 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.113828E-01 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.417 | TFLOPs: 147.17 | [default7]: iteration 605/ 3100 | consumed samples: 1239040 | consumed tokens: 2537553920 | elapsed time per iteration (s): 142.56 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.158909E-01 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.366 | TFLOPs: 146.66 | [default7]: iteration 606/ 3100 | consumed samples: 1241088 | consumed tokens: 2541748224 | elapsed time per iteration (s): 142.13 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.160704E-01 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.409 | TFLOPs: 147.10 | [default7]: iteration 607/ 3100 | consumed samples: 1243136 | consumed tokens: 2545942528 | elapsed time per iteration (s): 142.61 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.225644E-01 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.360 | TFLOPs: 146.60 | [default7]: iteration 608/ 3100 | consumed samples: 1245184 | consumed tokens: 2550136832 | elapsed time per iteration (s): 141.84 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.210176E-01 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.439 | TFLOPs: 147.40 | [default7]: iteration 609/ 3100 | consumed samples: 1247232 | consumed tokens: 2554331136 | elapsed time per iteration (s): 141.34 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.197715E-01 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.490 | TFLOPs: 147.92 | [default7]: iteration 610/ 3100 | consumed samples: 1249280 | consumed tokens: 2558525440 | elapsed time per iteration (s): 141.00 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.208656E-01 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.525 | TFLOPs: 148.27 | [default7]: iteration 611/ 3100 | consumed samples: 1251328 | consumed tokens: 2562719744 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.120579E-01 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.477 | TFLOPs: 147.79 | [default7]: iteration 612/ 3100 | consumed samples: 1253376 | consumed tokens: 2566914048 | elapsed time per iteration (s): 141.57 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.260076E-01 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.466 | TFLOPs: 147.68 | [default7]: iteration 613/ 3100 | consumed samples: 1255424 | consumed tokens: 2571108352 | elapsed time per iteration (s): 141.07 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.183555E-01 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.518 | TFLOPs: 148.20 | [default7]: iteration 614/ 3100 | consumed samples: 1257472 | consumed tokens: 2575302656 | elapsed time per iteration (s): 142.17 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.176122E-01 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.405 | TFLOPs: 147.05 | [default7]: iteration 615/ 3100 | consumed samples: 1259520 | consumed tokens: 2579496960 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.138933E-01 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.01 | [default7]: iteration 616/ 3100 | consumed samples: 1261568 | consumed tokens: 2583691264 | elapsed time per iteration (s): 142.96 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.221910E-01 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.326 | TFLOPs: 146.24 | [default7]: iteration 617/ 3100 | consumed samples: 1263616 | consumed tokens: 2587885568 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.058453E-01 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.01 | [default7]: iteration 618/ 3100 | consumed samples: 1265664 | consumed tokens: 2592079872 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.179011E-01 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.74 | [default7]: iteration 619/ 3100 | consumed samples: 1267712 | consumed tokens: 2596274176 | elapsed time per iteration (s): 141.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.180058E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.487 | TFLOPs: 147.89 | [default7]: iteration 620/ 3100 | consumed samples: 1269760 | consumed tokens: 2600468480 | elapsed time per iteration (s): 141.48 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.188092E-01 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.476 | TFLOPs: 147.78 | [default7]: iteration 621/ 3100 | consumed samples: 1271808 | consumed tokens: 2604662784 | elapsed time per iteration (s): 141.14 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.162535E-01 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.511 | TFLOPs: 148.13 | [default7]: iteration 622/ 3100 | consumed samples: 1273856 | consumed tokens: 2608857088 | elapsed time per iteration (s): 142.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.166898E-01 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.384 | TFLOPs: 146.84 | [default7]: iteration 623/ 3100 | consumed samples: 1275904 | consumed tokens: 2613051392 | elapsed time per iteration (s): 141.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.163162E-01 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.484 | TFLOPs: 147.86 | [default7]: iteration 624/ 3100 | consumed samples: 1277952 | consumed tokens: 2617245696 | elapsed time per iteration (s): 143.10 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.118894E-01 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.312 | TFLOPs: 146.10 | [default7]: iteration 625/ 3100 | consumed samples: 1280000 | consumed tokens: 2621440000 | elapsed time per iteration (s): 140.90 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.205950E-01 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.536 | TFLOPs: 148.39 | [default7]: iteration 626/ 3100 | consumed samples: 1282048 | consumed tokens: 2625634304 | elapsed time per iteration (s): 142.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.094805E-01 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.380 | TFLOPs: 146.80 | [default7]: iteration 627/ 3100 | consumed samples: 1284096 | consumed tokens: 2629828608 | elapsed time per iteration (s): 142.63 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.133140E-01 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.358 | TFLOPs: 146.58 | [default7]: iteration 628/ 3100 | consumed samples: 1286144 | consumed tokens: 2634022912 | elapsed time per iteration (s): 142.24 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.194793E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.398 | TFLOPs: 146.98 | [default7]: iteration 629/ 3100 | consumed samples: 1288192 | consumed tokens: 2638217216 | elapsed time per iteration (s): 141.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.106086E-01 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.502 | TFLOPs: 148.04 | [default7]: iteration 630/ 3100 | consumed samples: 1290240 | consumed tokens: 2642411520 | elapsed time per iteration (s): 141.72 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.112490E-01 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.451 | TFLOPs: 147.52 | [default7]: iteration 631/ 3100 | consumed samples: 1292288 | consumed tokens: 2646605824 | elapsed time per iteration (s): 141.21 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.086465E-01 | grad norm: 0.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.503 | TFLOPs: 148.05 | [default7]: iteration 632/ 3100 | consumed samples: 1294336 | consumed tokens: 2650800128 | elapsed time per iteration (s): 142.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.107441E-01 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.383 | TFLOPs: 146.83 | [default7]: iteration 633/ 3100 | consumed samples: 1296384 | consumed tokens: 2654994432 | elapsed time per iteration (s): 141.84 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.158429E-01 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.439 | TFLOPs: 147.40 | [default7]: iteration 634/ 3100 | consumed samples: 1298432 | consumed tokens: 2659188736 | elapsed time per iteration (s): 141.44 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.125026E-01 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.81 | [default7]: iteration 635/ 3100 | consumed samples: 1300480 | consumed tokens: 2663383040 | elapsed time per iteration (s): 142.56 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.150038E-01 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.366 | TFLOPs: 146.66 | [default7]: iteration 636/ 3100 | consumed samples: 1302528 | consumed tokens: 2667577344 | elapsed time per iteration (s): 141.49 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.152463E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.474 | TFLOPs: 147.76 | [default7]: iteration 637/ 3100 | consumed samples: 1304576 | consumed tokens: 2671771648 | elapsed time per iteration (s): 141.24 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.115915E-01 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.500 | TFLOPs: 148.02 | [default7]: iteration 638/ 3100 | consumed samples: 1306624 | consumed tokens: 2675965952 | elapsed time per iteration (s): 142.94 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.169427E-01 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.328 | TFLOPs: 146.27 | [default7]: iteration 639/ 3100 | consumed samples: 1308672 | consumed tokens: 2680160256 | elapsed time per iteration (s): 142.83 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.087853E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.338 | TFLOPs: 146.37 | [default7]: iteration 640/ 3100 | consumed samples: 1310720 | consumed tokens: 2684354560 | elapsed time per iteration (s): 142.74 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.101274E-01 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.348 | TFLOPs: 146.47 | [default7]: iteration 641/ 3100 | consumed samples: 1312768 | consumed tokens: 2688548864 | elapsed time per iteration (s): 142.65 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.171849E-01 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.357 | TFLOPs: 146.56 | [default7]: iteration 642/ 3100 | consumed samples: 1314816 | consumed tokens: 2692743168 | elapsed time per iteration (s): 142.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.080205E-01 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.390 | TFLOPs: 146.90 | [default7]: iteration 643/ 3100 | consumed samples: 1316864 | consumed tokens: 2696937472 | elapsed time per iteration (s): 142.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.165901E-01 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.376 | TFLOPs: 146.76 | [default7]: iteration 644/ 3100 | consumed samples: 1318912 | consumed tokens: 2701131776 | elapsed time per iteration (s): 142.69 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.064447E-01 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.352 | TFLOPs: 146.52 | [default7]: iteration 645/ 3100 | consumed samples: 1320960 | consumed tokens: 2705326080 | elapsed time per iteration (s): 142.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.115191E-01 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.334 | TFLOPs: 146.33 | [default7]: iteration 646/ 3100 | consumed samples: 1323008 | consumed tokens: 2709520384 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.097960E-01 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 647/ 3100 | consumed samples: 1325056 | consumed tokens: 2713714688 | elapsed time per iteration (s): 142.78 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.106949E-01 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.344 | TFLOPs: 146.43 | [default7]: iteration 648/ 3100 | consumed samples: 1327104 | consumed tokens: 2717908992 | elapsed time per iteration (s): 141.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.059289E-01 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.487 | TFLOPs: 147.89 | [default7]: iteration 649/ 3100 | consumed samples: 1329152 | consumed tokens: 2722103296 | elapsed time per iteration (s): 141.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.064846E-01 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.498 | TFLOPs: 148.00 | [default7]: iteration 650/ 3100 | consumed samples: 1331200 | consumed tokens: 2726297600 | elapsed time per iteration (s): 141.26 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.038973E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.01 | [default7]: iteration 651/ 3100 | consumed samples: 1333248 | consumed tokens: 2730491904 | elapsed time per iteration (s): 140.62 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.022108E-01 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.564 | TFLOPs: 148.67 | [default7]: iteration 652/ 3100 | consumed samples: 1335296 | consumed tokens: 2734686208 | elapsed time per iteration (s): 141.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.036626E-01 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.83 | [default7]: iteration 653/ 3100 | consumed samples: 1337344 | consumed tokens: 2738880512 | elapsed time per iteration (s): 141.72 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.023220E-01 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.451 | TFLOPs: 147.52 | [default7]: iteration 654/ 3100 | consumed samples: 1339392 | consumed tokens: 2743074816 | elapsed time per iteration (s): 140.93 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.063397E-01 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.532 | TFLOPs: 148.35 | [default7]: iteration 655/ 3100 | consumed samples: 1341440 | consumed tokens: 2747269120 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.033625E-01 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.98 | [default7]: iteration 656/ 3100 | consumed samples: 1343488 | consumed tokens: 2751463424 | elapsed time per iteration (s): 141.53 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.123794E-01 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.470 | TFLOPs: 147.72 | [default7]: iteration 657/ 3100 | consumed samples: 1345536 | consumed tokens: 2755657728 | elapsed time per iteration (s): 141.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.099074E-01 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.486 | TFLOPs: 147.88 | [default7]: iteration 658/ 3100 | consumed samples: 1347584 | consumed tokens: 2759852032 | elapsed time per iteration (s): 141.89 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.134493E-01 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.434 | TFLOPs: 147.35 | [default7]: iteration 659/ 3100 | consumed samples: 1349632 | consumed tokens: 2764046336 | elapsed time per iteration (s): 142.50 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.116225E-01 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.372 | TFLOPs: 146.71 | [default7]: iteration 660/ 3100 | consumed samples: 1351680 | consumed tokens: 2768240640 | elapsed time per iteration (s): 141.66 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.097038E-01 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.457 | TFLOPs: 147.58 | [default7]: iteration 661/ 3100 | consumed samples: 1353728 | consumed tokens: 2772434944 | elapsed time per iteration (s): 141.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.076012E-01 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 662/ 3100 | consumed samples: 1355776 | consumed tokens: 2776629248 | elapsed time per iteration (s): 141.26 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.017473E-01 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.498 | TFLOPs: 148.00 | [default7]: iteration 663/ 3100 | consumed samples: 1357824 | consumed tokens: 2780823552 | elapsed time per iteration (s): 140.96 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.993180E-01 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.529 | TFLOPs: 148.32 | [default7]: iteration 664/ 3100 | consumed samples: 1359872 | consumed tokens: 2785017856 | elapsed time per iteration (s): 140.99 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.034101E-01 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.526 | TFLOPs: 148.29 | [default7]: iteration 665/ 3100 | consumed samples: 1361920 | consumed tokens: 2789212160 | elapsed time per iteration (s): 141.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.959105E-01 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.492 | TFLOPs: 147.94 | [default7]: iteration 666/ 3100 | consumed samples: 1363968 | consumed tokens: 2793406464 | elapsed time per iteration (s): 141.82 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.977681E-01 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.441 | TFLOPs: 147.42 | [default7]: iteration 667/ 3100 | consumed samples: 1366016 | consumed tokens: 2797600768 | elapsed time per iteration (s): 141.57 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.015748E-01 | grad norm: 0.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.467 | TFLOPs: 147.68 | [default7]: iteration 668/ 3100 | consumed samples: 1368064 | consumed tokens: 2801795072 | elapsed time per iteration (s): 141.36 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.004065E-01 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.488 | TFLOPs: 147.90 | [default7]: iteration 669/ 3100 | consumed samples: 1370112 | consumed tokens: 2805989376 | elapsed time per iteration (s): 141.90 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.033797E-01 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.433 | TFLOPs: 147.34 | [default7]: iteration 670/ 3100 | consumed samples: 1372160 | consumed tokens: 2810183680 | elapsed time per iteration (s): 143.14 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.073196E-01 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.307 | TFLOPs: 146.05 | [default7]: iteration 671/ 3100 | consumed samples: 1374208 | consumed tokens: 2814377984 | elapsed time per iteration (s): 141.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.063738E-01 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.492 | TFLOPs: 147.94 | [default7]: iteration 672/ 3100 | consumed samples: 1376256 | consumed tokens: 2818572288 | elapsed time per iteration (s): 141.44 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.028740E-01 | grad norm: 0.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.82 | [default7]: iteration 673/ 3100 | consumed samples: 1378304 | consumed tokens: 2822766592 | elapsed time per iteration (s): 142.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.013594E-01 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.382 | TFLOPs: 146.82 | [default7]: iteration 674/ 3100 | consumed samples: 1380352 | consumed tokens: 2826960896 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.979874E-01 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.485 | TFLOPs: 147.87 | [default7]: iteration 675/ 3100 | consumed samples: 1382400 | consumed tokens: 2831155200 | elapsed time per iteration (s): 141.69 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.007178E-01 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.454 | TFLOPs: 147.56 | [default7]: iteration 676/ 3100 | consumed samples: 1384448 | consumed tokens: 2835349504 | elapsed time per iteration (s): 142.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.948240E-01 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.394 | TFLOPs: 146.94 | [default7]: iteration 677/ 3100 | consumed samples: 1386496 | consumed tokens: 2839543808 | elapsed time per iteration (s): 142.09 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.022387E-01 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.413 | TFLOPs: 147.14 | [default7]: iteration 678/ 3100 | consumed samples: 1388544 | consumed tokens: 2843738112 | elapsed time per iteration (s): 141.79 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.060949E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.444 | TFLOPs: 147.45 | [default7]: iteration 679/ 3100 | consumed samples: 1390592 | consumed tokens: 2847932416 | elapsed time per iteration (s): 142.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.018386E-01 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.385 | TFLOPs: 146.85 | [default7]: iteration 680/ 3100 | consumed samples: 1392640 | consumed tokens: 2852126720 | elapsed time per iteration (s): 142.66 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.982716E-01 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.356 | TFLOPs: 146.56 | [default7]: iteration 681/ 3100 | consumed samples: 1394688 | consumed tokens: 2856321024 | elapsed time per iteration (s): 142.03 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.061923E-01 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.420 | TFLOPs: 147.20 | [default7]: iteration 682/ 3100 | consumed samples: 1396736 | consumed tokens: 2860515328 | elapsed time per iteration (s): 141.04 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.936971E-01 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.521 | TFLOPs: 148.23 | [default7]: iteration 683/ 3100 | consumed samples: 1398784 | consumed tokens: 2864709632 | elapsed time per iteration (s): 141.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.954867E-01 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.501 | TFLOPs: 148.03 | [default7]: iteration 684/ 3100 | consumed samples: 1400832 | consumed tokens: 2868903936 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.073931E-01 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.74 | [default7]: iteration 685/ 3100 | consumed samples: 1402880 | consumed tokens: 2873098240 | elapsed time per iteration (s): 142.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.016283E-01 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.384 | TFLOPs: 146.84 | [default7]: iteration 686/ 3100 | consumed samples: 1404928 | consumed tokens: 2877292544 | elapsed time per iteration (s): 141.16 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.047465E-01 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.508 | TFLOPs: 148.11 | [default7]: iteration 687/ 3100 | consumed samples: 1406976 | consumed tokens: 2881486848 | elapsed time per iteration (s): 141.53 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.959442E-01 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.470 | TFLOPs: 147.72 | [default7]: iteration 688/ 3100 | consumed samples: 1409024 | consumed tokens: 2885681152 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.046486E-01 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.481 | TFLOPs: 147.83 | [default7]: iteration 689/ 3100 | consumed samples: 1411072 | consumed tokens: 2889875456 | elapsed time per iteration (s): 141.78 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.004462E-01 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.445 | TFLOPs: 147.46 | [default7]: iteration 690/ 3100 | consumed samples: 1413120 | consumed tokens: 2894069760 | elapsed time per iteration (s): 142.34 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.003144E-01 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.388 | TFLOPs: 146.88 | [default7]: iteration 691/ 3100 | consumed samples: 1415168 | consumed tokens: 2898264064 | elapsed time per iteration (s): 141.90 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.988725E-01 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.433 | TFLOPs: 147.34 | [default7]: iteration 692/ 3100 | consumed samples: 1417216 | consumed tokens: 2902458368 | elapsed time per iteration (s): 142.52 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.905765E-01 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.370 | TFLOPs: 146.70 | [default7]: iteration 693/ 3100 | consumed samples: 1419264 | consumed tokens: 2906652672 | elapsed time per iteration (s): 142.91 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.935369E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.331 | TFLOPs: 146.30 | [default7]: iteration 694/ 3100 | consumed samples: 1421312 | consumed tokens: 2910846976 | elapsed time per iteration (s): 142.66 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.950055E-01 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.356 | TFLOPs: 146.55 | [default7]: iteration 695/ 3100 | consumed samples: 1423360 | consumed tokens: 2915041280 | elapsed time per iteration (s): 140.99 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.978186E-01 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.526 | TFLOPs: 148.28 | [default7]: iteration 696/ 3100 | consumed samples: 1425408 | consumed tokens: 2919235584 | elapsed time per iteration (s): 141.87 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.969075E-01 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.435 | TFLOPs: 147.36 | [default7]: iteration 697/ 3100 | consumed samples: 1427456 | consumed tokens: 2923429888 | elapsed time per iteration (s): 141.47 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.029679E-01 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.476 | TFLOPs: 147.78 | [default7]: iteration 698/ 3100 | consumed samples: 1429504 | consumed tokens: 2927624192 | elapsed time per iteration (s): 143.13 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.088644E-01 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.309 | TFLOPs: 146.07 | [default7]: iteration 699/ 3100 | consumed samples: 1431552 | consumed tokens: 2931818496 | elapsed time per iteration (s): 143.19 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.948851E-01 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.302 | TFLOPs: 146.00 | [default7]: iteration 700/ 3100 | consumed samples: 1433600 | consumed tokens: 2936012800 | elapsed time per iteration (s): 147.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.925049E-01 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.902 | TFLOPs: 141.92 | [default7]: iteration 701/ 3100 | consumed samples: 1435648 | consumed tokens: 2940207104 | elapsed time per iteration (s): 146.09 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.935391E-01 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.018 | TFLOPs: 143.11 | [default7]: iteration 702/ 3100 | consumed samples: 1437696 | consumed tokens: 2944401408 | elapsed time per iteration (s): 144.05 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.910982E-01 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.217 | TFLOPs: 145.14 | [default7]: iteration 703/ 3100 | consumed samples: 1439744 | consumed tokens: 2948595712 | elapsed time per iteration (s): 142.15 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 6.039896E-01 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.408 | TFLOPs: 147.08 | [default7]: iteration 704/ 3100 | consumed samples: 1441792 | consumed tokens: 2952790016 | elapsed time per iteration (s): 143.70 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.940972E-01 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.252 | TFLOPs: 145.49 | [default7]: iteration 705/ 3100 | consumed samples: 1443840 | consumed tokens: 2956984320 | elapsed time per iteration (s): 141.92 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.930418E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.431 | TFLOPs: 147.32 | [default7]: iteration 706/ 3100 | consumed samples: 1445888 | consumed tokens: 2961178624 | elapsed time per iteration (s): 143.16 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.926180E-01 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.306 | TFLOPs: 146.04 | [default7]: iteration 707/ 3100 | consumed samples: 1447936 | consumed tokens: 2965372928 | elapsed time per iteration (s): 141.44 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.915858E-01 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.82 | [default7]: iteration 708/ 3100 | consumed samples: 1449984 | consumed tokens: 2969567232 | elapsed time per iteration (s): 143.57 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.929629E-01 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.264 | TFLOPs: 145.62 | [default7]: iteration 709/ 3100 | consumed samples: 1452032 | consumed tokens: 2973761536 | elapsed time per iteration (s): 146.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.920980E-01 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.988 | TFLOPs: 142.80 | [default7]: iteration 710/ 3100 | consumed samples: 1454080 | consumed tokens: 2977955840 | elapsed time per iteration (s): 142.17 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.829424E-01 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.405 | TFLOPs: 147.06 | [default7]: iteration 711/ 3100 | consumed samples: 1456128 | consumed tokens: 2982150144 | elapsed time per iteration (s): 141.12 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.927553E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.512 | TFLOPs: 148.15 | [default7]: iteration 712/ 3100 | consumed samples: 1458176 | consumed tokens: 2986344448 | elapsed time per iteration (s): 141.44 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.980108E-01 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.81 | [default7]: iteration 713/ 3100 | consumed samples: 1460224 | consumed tokens: 2990538752 | elapsed time per iteration (s): 142.73 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.965166E-01 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.349 | TFLOPs: 146.48 | [default7]: iteration 714/ 3100 | consumed samples: 1462272 | consumed tokens: 2994733056 | elapsed time per iteration (s): 141.13 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.913858E-01 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.511 | TFLOPs: 148.13 | [default7]: iteration 715/ 3100 | consumed samples: 1464320 | consumed tokens: 2998927360 | elapsed time per iteration (s): 141.52 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.962923E-01 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.471 | TFLOPs: 147.73 | [default7]: iteration 716/ 3100 | consumed samples: 1466368 | consumed tokens: 3003121664 | elapsed time per iteration (s): 141.64 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.956753E-01 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.459 | TFLOPs: 147.60 | [default7]: iteration 717/ 3100 | consumed samples: 1468416 | consumed tokens: 3007315968 | elapsed time per iteration (s): 142.68 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.933878E-01 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.354 | TFLOPs: 146.53 | [default7]: iteration 718/ 3100 | consumed samples: 1470464 | consumed tokens: 3011510272 | elapsed time per iteration (s): 142.29 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.832549E-01 | grad norm: 1.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.393 | TFLOPs: 146.93 | [default7]: iteration 719/ 3100 | consumed samples: 1472512 | consumed tokens: 3015704576 | elapsed time per iteration (s): 141.98 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.963894E-01 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.424 | TFLOPs: 147.25 | [default7]: iteration 720/ 3100 | consumed samples: 1474560 | consumed tokens: 3019898880 | elapsed time per iteration (s): 141.04 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.895142E-01 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.521 | TFLOPs: 148.23 | [default7]: iteration 721/ 3100 | consumed samples: 1476608 | consumed tokens: 3024093184 | elapsed time per iteration (s): 141.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.934165E-01 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.497 | TFLOPs: 147.99 | [default7]: iteration 722/ 3100 | consumed samples: 1478656 | consumed tokens: 3028287488 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.948765E-01 | grad norm: 0.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.74 | [default7]: iteration 723/ 3100 | consumed samples: 1480704 | consumed tokens: 3032481792 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.922184E-01 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.485 | TFLOPs: 147.87 | [default7]: iteration 724/ 3100 | consumed samples: 1482752 | consumed tokens: 3036676096 | elapsed time per iteration (s): 141.08 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.835904E-01 | grad norm: 1.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.517 | TFLOPs: 148.20 | [default7]: iteration 725/ 3100 | consumed samples: 1484800 | consumed tokens: 3040870400 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.944502E-01 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.74 | [default7]: iteration 726/ 3100 | consumed samples: 1486848 | consumed tokens: 3045064704 | elapsed time per iteration (s): 141.06 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.823078E-01 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.519 | TFLOPs: 148.22 | [default7]: iteration 727/ 3100 | consumed samples: 1488896 | consumed tokens: 3049259008 | elapsed time per iteration (s): 141.11 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.890587E-01 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.514 | TFLOPs: 148.17 | [default7]: iteration 728/ 3100 | consumed samples: 1490944 | consumed tokens: 3053453312 | elapsed time per iteration (s): 142.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.837691E-01 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.334 | TFLOPs: 146.33 | [default7]: iteration 729/ 3100 | consumed samples: 1492992 | consumed tokens: 3057647616 | elapsed time per iteration (s): 142.70 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.823323E-01 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.352 | TFLOPs: 146.51 | [default7]: iteration 730/ 3100 | consumed samples: 1495040 | consumed tokens: 3061841920 | elapsed time per iteration (s): 142.90 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.914023E-01 | grad norm: 0.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.332 | TFLOPs: 146.31 | [default7]: iteration 731/ 3100 | consumed samples: 1497088 | consumed tokens: 3066036224 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.860320E-01 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 732/ 3100 | consumed samples: 1499136 | consumed tokens: 3070230528 | elapsed time per iteration (s): 141.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.862100E-01 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.435 | TFLOPs: 147.36 | [default7]: iteration 733/ 3100 | consumed samples: 1501184 | consumed tokens: 3074424832 | elapsed time per iteration (s): 141.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.886378E-01 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.497 | TFLOPs: 148.00 | [default7]: iteration 734/ 3100 | consumed samples: 1503232 | consumed tokens: 3078619136 | elapsed time per iteration (s): 141.57 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.845551E-01 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.466 | TFLOPs: 147.68 | [default7]: iteration 735/ 3100 | consumed samples: 1505280 | consumed tokens: 3082813440 | elapsed time per iteration (s): 141.74 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.862787E-01 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.449 | TFLOPs: 147.50 | [default7]: iteration 736/ 3100 | consumed samples: 1507328 | consumed tokens: 3087007744 | elapsed time per iteration (s): 141.21 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.787562E-01 | grad norm: 0.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.504 | TFLOPs: 148.06 | [default7]: iteration 737/ 3100 | consumed samples: 1509376 | consumed tokens: 3091202048 | elapsed time per iteration (s): 141.17 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.767437E-01 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.507 | TFLOPs: 148.09 | [default7]: iteration 738/ 3100 | consumed samples: 1511424 | consumed tokens: 3095396352 | elapsed time per iteration (s): 142.34 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.864971E-01 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.388 | TFLOPs: 146.88 | [default7]: iteration 739/ 3100 | consumed samples: 1513472 | consumed tokens: 3099590656 | elapsed time per iteration (s): 142.55 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.854946E-01 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.367 | TFLOPs: 146.67 | [default7]: iteration 740/ 3100 | consumed samples: 1515520 | consumed tokens: 3103784960 | elapsed time per iteration (s): 141.06 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.944157E-01 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.519 | TFLOPs: 148.21 | [default7]: iteration 741/ 3100 | consumed samples: 1517568 | consumed tokens: 3107979264 | elapsed time per iteration (s): 144.61 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.785881E-01 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.162 | TFLOPs: 144.57 | [default7]: iteration 742/ 3100 | consumed samples: 1519616 | consumed tokens: 3112173568 | elapsed time per iteration (s): 143.98 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.703576E-01 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.224 | TFLOPs: 145.20 | [default7]: iteration 743/ 3100 | consumed samples: 1521664 | consumed tokens: 3116367872 | elapsed time per iteration (s): 141.83 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.777282E-01 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.440 | TFLOPs: 147.41 | [default7]: iteration 744/ 3100 | consumed samples: 1523712 | consumed tokens: 3120562176 | elapsed time per iteration (s): 144.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.771205E-01 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.182 | TFLOPs: 144.78 | [default7]: iteration 745/ 3100 | consumed samples: 1525760 | consumed tokens: 3124756480 | elapsed time per iteration (s): 142.09 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.783058E-01 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.413 | TFLOPs: 147.14 | [default7]: iteration 746/ 3100 | consumed samples: 1527808 | consumed tokens: 3128950784 | elapsed time per iteration (s): 141.53 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.812372E-01 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.470 | TFLOPs: 147.72 | [default0]:saving checkpoint at iteration 747 to /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]:[2022-09-05 01:24:09,852] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_26-model_00-model_states.pt... [default7]: iteration 747/ 3100 | consumed samples: 1529856 | consumed tokens: 3133145088 | elapsed time per iteration (s): 143.70 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.836755E-01 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.252 | TFLOPs: 145.49 | [default4]:[2022-09-05 01:24:09,852] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_27-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,803] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step747 is begin to save! [default4]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_51-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_16-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,912] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_50-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_36-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_70-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_42-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_34-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,912] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_71_model_states.pt... [default4]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_37-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,912] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_32-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,912] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_06-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_43-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_09-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_31-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_65-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_29-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_52-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_05-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,899] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_40-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,925] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_56-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_72-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_66-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_49-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_04-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_48-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_10-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_18-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,925] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_57-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_25-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_63-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_13-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_53-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_24-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_44-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_64-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_62-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_60-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,920] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_47-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_08-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_68-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_21-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_38-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_35-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,899] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_41-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_28-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,920] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_46-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_71-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_12-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_20-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_33-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_67-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_15-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_30-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_69-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_14-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,916] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_01-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_19-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_03-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_23-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_59-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_61-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_22-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_45-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_39-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,912] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_07-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_11-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_17-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_55-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_58-model_00-model_states.pt... [default0]:[2022-09-05 01:24:09,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_54-model_00-model_states.pt... [default4]:[2022-09-05 01:24:09,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_71_model_states.pt. [default0]:[2022-09-05 01:24:12,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_04-model_00-model_states.pt. [default0]:[2022-09-05 01:24:12,945] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_02_model_states.pt... [default0]:[2022-09-05 01:24:12,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_02_model_states.pt. [default0]:[2022-09-05 01:24:13,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_68-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_66_model_states.pt... [default0]:[2022-09-05 01:24:13,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_66_model_states.pt. [default0]:[2022-09-05 01:24:13,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_28-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,135] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_26_model_states.pt... [default0]:[2022-09-05 01:24:13,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_26_model_states.pt. [default4]:[2022-09-05 01:24:13,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_63-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,245] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_61_model_states.pt... [default4]:[2022-09-05 01:24:13,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_61_model_states.pt. [default0]:[2022-09-05 01:24:13,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_08-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_06_model_states.pt... [default0]:[2022-09-05 01:24:13,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_06_model_states.pt. [default4]:[2022-09-05 01:24:13,202] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_27-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,202] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_25_model_states.pt... [default4]:[2022-09-05 01:24:13,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_25_model_states.pt. [default0]:[2022-09-05 01:24:13,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_14-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,204] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_12_model_states.pt... [default0]:[2022-09-05 01:24:13,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_12_model_states.pt. [default0]:[2022-09-05 01:24:13,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_72-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,295] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_74-model_00-model_states.pt... [default0]:[2022-09-05 01:24:13,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_74-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_70_model_states.pt... [default0]:[2022-09-05 01:24:13,299] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_70_model_states.pt. [default0]:[2022-09-05 01:24:13,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_48-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,273] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_46_model_states.pt... [default0]:[2022-09-05 01:24:13,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_46_model_states.pt. [default4]:[2022-09-05 01:24:13,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_25-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,287] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_23_model_states.pt... [default4]:[2022-09-05 01:24:13,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_23_model_states.pt. [default0]:[2022-09-05 01:24:13,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_24-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,334] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_22_model_states.pt... [default0]:[2022-09-05 01:24:13,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_22_model_states.pt. [default4]:[2022-09-05 01:24:13,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_19-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,367] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_17_model_states.pt... [default4]:[2022-09-05 01:24:13,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_17_model_states.pt. [default4]:[2022-09-05 01:24:13,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_23-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_21_model_states.pt... [default4]:[2022-09-05 01:24:13,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_21_model_states.pt. [default4]:[2022-09-05 01:24:13,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_11-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_09_model_states.pt... [default4]:[2022-09-05 01:24:13,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_09_model_states.pt. [default0]:[2022-09-05 01:24:13,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_16-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_14_model_states.pt... [default0]:[2022-09-05 01:24:13,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_14_model_states.pt. [default0]:[2022-09-05 01:24:13,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_70-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,354] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_68_model_states.pt... [default0]:[2022-09-05 01:24:13,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_68_model_states.pt. [default4]:[2022-09-05 01:24:13,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_37-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_35_model_states.pt... [default4]:[2022-09-05 01:24:13,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_35_model_states.pt. [default4]:[2022-09-05 01:24:13,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_29-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,351] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_27_model_states.pt... [default4]:[2022-09-05 01:24:13,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_27_model_states.pt. [default0]:[2022-09-05 01:24:13,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_40-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_49-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,414] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_47_model_states.pt... [default4]:[2022-09-05 01:24:13,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_47_model_states.pt. [default0]:[2022-09-05 01:24:13,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_18-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,439] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_16_model_states.pt... [default0]:[2022-09-05 01:24:13,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_16_model_states.pt. [default0]:[2022-09-05 01:24:13,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_26-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_24_model_states.pt... [default0]:[2022-09-05 01:24:13,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_24_model_states.pt. [default4]:[2022-09-05 01:24:13,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_13-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_11_model_states.pt... [default4]:[2022-09-05 01:24:13,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_11_model_states.pt. [default0]:[2022-09-05 01:24:13,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_64-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,404] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_62_model_states.pt... [default0]:[2022-09-05 01:24:13,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_62_model_states.pt. [default4]:[2022-09-05 01:24:13,460] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_71-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_69_model_states.pt... [default4]:[2022-09-05 01:24:13,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_69_model_states.pt. [default4]:[2022-09-05 01:24:13,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_69-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_67_model_states.pt... [default4]:[2022-09-05 01:24:13,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_67_model_states.pt. [default0]:[2022-09-05 01:24:13,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_22-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_20_model_states.pt... [default0]:[2022-09-05 01:24:13,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_20_model_states.pt. [default4]:[2022-09-05 01:24:13,458] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_17-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_15_model_states.pt... [default4]:[2022-09-05 01:24:13,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_15_model_states.pt. [default0]:[2022-09-05 01:24:13,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_36-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,519] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_34_model_states.pt... [default0]:[2022-09-05 01:24:13,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_34_model_states.pt. [default0]:[2022-09-05 01:24:13,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_06-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_04_model_states.pt... [default0]:[2022-09-05 01:24:13,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_04_model_states.pt. [default4]:[2022-09-05 01:24:13,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_09-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_07_model_states.pt... [default4]:[2022-09-05 01:24:13,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_07_model_states.pt. [default0]:[2022-09-05 01:24:13,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_38_model_states.pt... [default0]:[2022-09-05 01:24:13,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_38_model_states.pt. [default0]:[2022-09-05 01:24:13,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_10-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,495] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_08_model_states.pt... [default0]:[2022-09-05 01:24:13,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_08_model_states.pt. [default4]:[2022-09-05 01:24:13,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_21-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,513] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_19_model_states.pt... [default4]:[2022-09-05 01:24:13,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_19_model_states.pt. [default4]:[2022-09-05 01:24:13,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_41-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,532] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_39_model_states.pt... [default4]:[2022-09-05 01:24:13,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_39_model_states.pt. [default4]:[2022-09-05 01:24:13,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_15-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_13_model_states.pt... [default4]:[2022-09-05 01:24:13,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_13_model_states.pt. [default4]:[2022-09-05 01:24:13,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_07-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,553] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_05_model_states.pt... [default4]:[2022-09-05 01:24:13,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_05_model_states.pt. [default4]:[2022-09-05 01:24:13,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_55-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,574] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_53_model_states.pt... [default4]:[2022-09-05 01:24:13,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_53_model_states.pt. [default0]:[2022-09-05 01:24:13,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_54-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,584] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_52_model_states.pt... [default0]:[2022-09-05 01:24:13,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_52_model_states.pt. [default0]:[2022-09-05 01:24:13,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_50-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_48_model_states.pt... [default0]:[2022-09-05 01:24:13,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_48_model_states.pt. [default0]:[2022-09-05 01:24:13,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_42-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,550] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_40_model_states.pt... [default0]:[2022-09-05 01:24:13,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_40_model_states.pt. [default4]:[2022-09-05 01:24:13,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_31-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,629] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_29_model_states.pt... [default4]:[2022-09-05 01:24:13,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_29_model_states.pt. [default4]:[2022-09-05 01:24:13,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_65-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_63_model_states.pt... [default4]:[2022-09-05 01:24:13,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_63_model_states.pt. [default4]:[2022-09-05 01:24:13,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_05-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,625] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_03_model_states.pt... [default4]:[2022-09-05 01:24:13,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_03_model_states.pt. [default0]:[2022-09-05 01:24:13,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_44-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_42_model_states.pt... [default0]:[2022-09-05 01:24:13,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_42_model_states.pt. [default4]:[2022-09-05 01:24:13,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_47-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,655] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_45_model_states.pt... [default4]:[2022-09-05 01:24:13,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_45_model_states.pt. [default0]:[2022-09-05 01:24:13,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_38-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,677] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_36_model_states.pt... [default0]:[2022-09-05 01:24:13,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_12-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_10_model_states.pt... [default0]:[2022-09-05 01:24:13,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_10_model_states.pt. [default0]:[2022-09-05 01:24:13,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_20-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_18_model_states.pt... [default0]:[2022-09-05 01:24:13,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_18_model_states.pt. [default4]:[2022-09-05 01:24:13,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_33-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,646] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_31_model_states.pt... [default4]:[2022-09-05 01:24:13,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_31_model_states.pt. [default0]:[2022-09-05 01:24:13,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_30-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,702] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_28_model_states.pt... [default4]:[2022-09-05 01:24:13,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_45-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_43_model_states.pt... [default4]:[2022-09-05 01:24:13,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_43_model_states.pt. [default4]:[2022-09-05 01:24:13,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_39-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_37_model_states.pt... [default4]:[2022-09-05 01:24:13,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_37_model_states.pt. [default0]:[2022-09-05 01:24:13,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_58-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,676] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_56_model_states.pt... [default0]:[2022-09-05 01:24:13,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_56_model_states.pt. [default4]:[2022-09-05 01:24:13,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_51-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,709] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_49_model_states.pt... [default4]:[2022-09-05 01:24:13,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_49_model_states.pt. [default0]:[2022-09-05 01:24:13,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_32-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,638] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_30_model_states.pt... [default0]:[2022-09-05 01:24:13,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_30_model_states.pt. [default4]:[2022-09-05 01:24:13,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_53-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,742] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_51_model_states.pt... [default4]:[2022-09-05 01:24:13,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_51_model_states.pt. [default0]:[2022-09-05 01:24:13,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_62-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,679] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_60_model_states.pt... [default0]:[2022-09-05 01:24:13,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_60_model_states.pt. [default0]:[2022-09-05 01:24:13,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_36_model_states.pt. [default4]:[2022-09-05 01:24:13,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_35-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,740] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_33_model_states.pt... [default4]:[2022-09-05 01:24:13,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_33_model_states.pt. [default0]:[2022-09-05 01:24:13,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_46-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,693] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_44_model_states.pt... [default0]:[2022-09-05 01:24:13,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_44_model_states.pt. [default0]:[2022-09-05 01:24:13,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_28_model_states.pt. [default4]:[2022-09-05 01:24:13,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_43-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,752] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_41_model_states.pt... [default4]:[2022-09-05 01:24:13,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_41_model_states.pt. [default0]:[2022-09-05 01:24:13,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_52-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_50_model_states.pt... [default0]:[2022-09-05 01:24:13,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_50_model_states.pt. [default0]:[2022-09-05 01:24:13,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_60-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,832] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_58_model_states.pt... [default0]:[2022-09-05 01:24:13,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_58_model_states.pt. [default4]:[2022-09-05 01:24:13,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_67-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,861] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_65_model_states.pt... [default4]:[2022-09-05 01:24:13,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_65_model_states.pt. [default4]:[2022-09-05 01:24:13,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_03-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_01_model_states.pt... [default4]:[2022-09-05 01:24:13,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_59-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,824] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_57_model_states.pt... [default4]:[2022-09-05 01:24:13,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_57_model_states.pt. [default4]:[2022-09-05 01:24:13,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_61-model_00-model_states.pt. [default4]:[2022-09-05 01:24:13,811] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_59_model_states.pt... [default4]:[2022-09-05 01:24:13,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_59_model_states.pt. [default0]:[2022-09-05 01:24:13,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_34-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,852] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_32_model_states.pt... [default0]:[2022-09-05 01:24:13,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_32_model_states.pt. [default0]:[2022-09-05 01:24:13,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_66-model_00-model_states.pt. [default0]:[2022-09-05 01:24:13,910] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_64_model_states.pt... [default0]:[2022-09-05 01:24:13,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_64_model_states.pt. [default4]:[2022-09-05 01:24:13,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_01_model_states.pt. [default0]:[2022-09-05 01:24:14,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_56-model_00-model_states.pt. [default0]:[2022-09-05 01:24:14,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_54_model_states.pt... [default0]:[2022-09-05 01:24:14,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_54_model_states.pt. [default4]:[2022-09-05 01:24:14,077] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_57-model_00-model_states.pt. [default4]:[2022-09-05 01:24:14,078] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_55_model_states.pt... [default4]:[2022-09-05 01:24:14,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_55_model_states.pt. [default0]:[2022-09-05 01:24:15,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/layer_01-model_00-model_states.pt. [default0]:[2022-09-05 01:24:15,035] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_00_model_states.pt [default0]:[2022-09-05 01:24:15,035] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_00_model_states.pt... [default0]:[2022-09-05 01:24:15,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/mp_rank_00_model_states.pt. [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt... [default7]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt... [default6]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt... [default4]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt... [default1]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt... [default3]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt... [default0]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt... [default2]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt... [default5]:[2022-09-05 01:24:15,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt... [default0]:[2022-09-05 01:24:22,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt. [default0]:[2022-09-05 01:24:22,733] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt [default3]:[2022-09-05 01:24:22,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt. [default3]:[2022-09-05 01:24:22,871] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt [default4]:[2022-09-05 01:24:22,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt. [default4]:[2022-09-05 01:24:22,889] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt [default2]:[2022-09-05 01:24:22,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt. [default2]:[2022-09-05 01:24:22,841] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt [default3]:[2022-09-05 01:24:23,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt. [default3]:[2022-09-05 01:24:23,004] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt [default1]:[2022-09-05 01:24:22,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt. [default1]:[2022-09-05 01:24:22,948] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt [default0]:[2022-09-05 01:24:23,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt. [default0]:[2022-09-05 01:24:23,022] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt [default4]:[2022-09-05 01:24:23,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt. [default4]:[2022-09-05 01:24:23,028] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt [default1]:[2022-09-05 01:24:23,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt. [default1]:[2022-09-05 01:24:23,087] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt [default4]:[2022-09-05 01:24:23,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt. [default4]:[2022-09-05 01:24:23,096] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt [default2]:[2022-09-05 01:24:23,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt. [default2]:[2022-09-05 01:24:23,040] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt [default2]:[2022-09-05 01:24:23,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt. [default2]:[2022-09-05 01:24:23,097] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt [default6]:[2022-09-05 01:24:23,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt. [default6]:[2022-09-05 01:24:23,220] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt [default6]:[2022-09-05 01:24:23,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt. [default6]:[2022-09-05 01:24:23,247] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt [default2]:[2022-09-05 01:24:23,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt. [default2]:[2022-09-05 01:24:23,227] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt [default1]:[2022-09-05 01:24:23,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt. [default1]:[2022-09-05 01:24:23,296] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt [default4]:[2022-09-05 01:24:23,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt. [default4]:[2022-09-05 01:24:23,329] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt [default0]:[2022-09-05 01:24:23,299] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt. [default0]:[2022-09-05 01:24:23,299] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt [default3]:[2022-09-05 01:24:23,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt. [default3]:[2022-09-05 01:24:23,346] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt [default0]:[2022-09-05 01:24:23,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt. [default0]:[2022-09-05 01:24:23,361] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt [default1]:[2022-09-05 01:24:23,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt. [default1]:[2022-09-05 01:24:23,401] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt [default7]:[2022-09-05 01:24:23,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt. [default7]:[2022-09-05 01:24:23,465] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt [default1]:[2022-09-05 01:24:23,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt. [default1]:[2022-09-05 01:24:23,474] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt [default5]:[2022-09-05 01:24:23,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt. [default5]:[2022-09-05 01:24:23,559] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt [default3]:[2022-09-05 01:24:23,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt. [default3]:[2022-09-05 01:24:23,604] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt [default2]:[2022-09-05 01:24:23,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt. [default2]:[2022-09-05 01:24:23,591] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt [default3]:[2022-09-05 01:24:23,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt. [default3]:[2022-09-05 01:24:23,650] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt [default2]:[2022-09-05 01:24:23,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt. [default2]:[2022-09-05 01:24:23,615] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt [default5]:[2022-09-05 01:24:23,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt. [default5]:[2022-09-05 01:24:23,601] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt [default1]:[2022-09-05 01:24:23,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt. [default1]:[2022-09-05 01:24:23,686] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt [default7]:[2022-09-05 01:24:23,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt. [default7]:[2022-09-05 01:24:23,689] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt [default0]:[2022-09-05 01:24:23,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt. [default0]:[2022-09-05 01:24:23,730] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt [default5]:[2022-09-05 01:24:23,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt. [default5]:[2022-09-05 01:24:23,792] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt [default2]:[2022-09-05 01:24:23,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt. [default2]:[2022-09-05 01:24:23,797] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt [default4]:[2022-09-05 01:24:23,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt. [default4]:[2022-09-05 01:24:23,830] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt [default0]:[2022-09-05 01:24:23,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt. [default0]:[2022-09-05 01:24:23,820] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt [default7]:[2022-09-05 01:24:23,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt. [default7]:[2022-09-05 01:24:23,847] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt [default4]:[2022-09-05 01:24:23,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt. [default4]:[2022-09-05 01:24:23,923] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt [default5]:[2022-09-05 01:24:23,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt. [default5]:[2022-09-05 01:24:23,972] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt [default3]:[2022-09-05 01:24:23,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt. [default3]:[2022-09-05 01:24:23,936] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt [default5]:[2022-09-05 01:24:23,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt. [default5]:[2022-09-05 01:24:23,990] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt [default7]:[2022-09-05 01:24:23,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt. [default7]:[2022-09-05 01:24:23,947] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt [default2]:[2022-09-05 01:24:23,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt. [default2]:[2022-09-05 01:24:23,954] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt [default5]:[2022-09-05 01:24:23,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt. [default5]:[2022-09-05 01:24:23,981] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt [default5]:[2022-09-05 01:24:23,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt. [default5]:[2022-09-05 01:24:23,975] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt [default5]:[2022-09-05 01:24:24,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt. [default5]:[2022-09-05 01:24:24,015] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt [default0]:[2022-09-05 01:24:23,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt. [default0]:[2022-09-05 01:24:23,969] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt [default4]:[2022-09-05 01:24:24,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt. [default4]:[2022-09-05 01:24:24,036] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt [default0]:[2022-09-05 01:24:23,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt. [default0]:[2022-09-05 01:24:23,982] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt [default6]:[2022-09-05 01:24:23,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt. [default6]:[2022-09-05 01:24:23,989] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt [default4]:[2022-09-05 01:24:24,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt. [default4]:[2022-09-05 01:24:24,107] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt [default5]:[2022-09-05 01:24:24,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt. [default5]:[2022-09-05 01:24:24,100] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt [default6]:[2022-09-05 01:24:24,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt. [default6]:[2022-09-05 01:24:24,050] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt [default7]:[2022-09-05 01:24:24,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt. [default7]:[2022-09-05 01:24:24,116] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt [default5]:[2022-09-05 01:24:24,077] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt. [default5]:[2022-09-05 01:24:24,077] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt [default2]:[2022-09-05 01:24:24,112] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt. [default2]:[2022-09-05 01:24:24,112] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt [default1]:[2022-09-05 01:24:24,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt. [default1]:[2022-09-05 01:24:24,193] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt [default3]:[2022-09-05 01:24:24,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt. [default3]:[2022-09-05 01:24:24,143] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt [default7]:[2022-09-05 01:24:24,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt. [default7]:[2022-09-05 01:24:24,200] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt [default0]:[2022-09-05 01:24:24,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt. [default0]:[2022-09-05 01:24:24,201] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt [default7]:[2022-09-05 01:24:24,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt. [default7]:[2022-09-05 01:24:24,157] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt [default5]:[2022-09-05 01:24:24,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt. [default5]:[2022-09-05 01:24:24,190] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt [default1]:[2022-09-05 01:24:24,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt. [default1]:[2022-09-05 01:24:24,253] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt [default0]:[2022-09-05 01:24:24,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt. [default0]:[2022-09-05 01:24:24,183] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt [default3]:[2022-09-05 01:24:24,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt. [default3]:[2022-09-05 01:24:24,283] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt [default0]:[2022-09-05 01:24:24,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt. [default0]:[2022-09-05 01:24:24,265] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt [default1]:[2022-09-05 01:24:24,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt. [default1]:[2022-09-05 01:24:24,302] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt [default6]:[2022-09-05 01:24:24,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt. [default6]:[2022-09-05 01:24:24,279] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt [default2]:[2022-09-05 01:24:24,272] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt. [default2]:[2022-09-05 01:24:24,273] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt [default5]:[2022-09-05 01:24:24,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt. [default5]:[2022-09-05 01:24:24,293] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt [default5]:[2022-09-05 01:24:24,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt. [default5]:[2022-09-05 01:24:24,279] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt [default3]:[2022-09-05 01:24:24,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt. [default3]:[2022-09-05 01:24:24,317] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt [default1]:[2022-09-05 01:24:24,272] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt. [default1]:[2022-09-05 01:24:24,272] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt [default7]:[2022-09-05 01:24:24,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt. [default7]:[2022-09-05 01:24:24,329] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt [default7]:[2022-09-05 01:24:24,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt. [default7]:[2022-09-05 01:24:24,336] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt [default0]:[2022-09-05 01:24:24,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt. [default0]:[2022-09-05 01:24:24,343] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt [default4]:[2022-09-05 01:24:24,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt. [default4]:[2022-09-05 01:24:24,408] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt [default3]:[2022-09-05 01:24:24,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt. [default3]:[2022-09-05 01:24:24,430] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt [default2]:[2022-09-05 01:24:24,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt. [default2]:[2022-09-05 01:24:24,501] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt [default5]:[2022-09-05 01:24:24,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt. [default5]:[2022-09-05 01:24:24,502] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt [default6]:[2022-09-05 01:24:24,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt. [default6]:[2022-09-05 01:24:24,449] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt [default4]:[2022-09-05 01:24:24,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt. [default4]:[2022-09-05 01:24:24,501] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt [default6]:[2022-09-05 01:24:24,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt. [default6]:[2022-09-05 01:24:24,500] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt [default2]:[2022-09-05 01:24:24,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt. [default2]:[2022-09-05 01:24:24,544] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt [default5]:[2022-09-05 01:24:24,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt. [default5]:[2022-09-05 01:24:24,479] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt [default4]:[2022-09-05 01:24:24,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt. [default4]:[2022-09-05 01:24:24,479] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt [default3]:[2022-09-05 01:24:24,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt. [default3]:[2022-09-05 01:24:24,518] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt [default5]:[2022-09-05 01:24:24,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt. [default5]:[2022-09-05 01:24:24,501] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt [default7]:[2022-09-05 01:24:24,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt. [default7]:[2022-09-05 01:24:24,586] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt [default7]:[2022-09-05 01:24:24,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt. [default7]:[2022-09-05 01:24:24,575] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt [default3]:[2022-09-05 01:24:24,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt. [default3]:[2022-09-05 01:24:24,588] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt [default4]:[2022-09-05 01:24:24,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt. [default4]:[2022-09-05 01:24:24,586] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt [default3]:[2022-09-05 01:24:24,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt. [default3]:[2022-09-05 01:24:24,606] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt [default5]:[2022-09-05 01:24:24,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt. [default5]:[2022-09-05 01:24:24,608] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt [default7]:[2022-09-05 01:24:24,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt. [default7]:[2022-09-05 01:24:24,645] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt [default0]:[2022-09-05 01:24:24,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt. [default0]:[2022-09-05 01:24:24,566] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt [default3]:[2022-09-05 01:24:24,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt. [default3]:[2022-09-05 01:24:24,635] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt [default6]:[2022-09-05 01:24:24,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt. [default6]:[2022-09-05 01:24:24,601] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt [default6]:[2022-09-05 01:24:24,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt. [default6]:[2022-09-05 01:24:24,585] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt [default6]:[2022-09-05 01:24:24,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt. [default6]:[2022-09-05 01:24:24,646] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt [default4]:[2022-09-05 01:24:24,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt. [default4]:[2022-09-05 01:24:24,674] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt [default2]:[2022-09-05 01:24:24,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt. [default2]:[2022-09-05 01:24:24,700] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt [default5]:[2022-09-05 01:24:24,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt. [default5]:[2022-09-05 01:24:24,716] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt [default3]:[2022-09-05 01:24:24,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt. [default3]:[2022-09-05 01:24:24,655] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt [default4]:[2022-09-05 01:24:24,713] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt. [default4]:[2022-09-05 01:24:24,713] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt [default6]:[2022-09-05 01:24:24,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt. [default6]:[2022-09-05 01:24:24,712] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt [default7]:[2022-09-05 01:24:24,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt. [default7]:[2022-09-05 01:24:24,706] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt [default4]:[2022-09-05 01:24:24,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt. [default4]:[2022-09-05 01:24:24,760] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt [default2]:[2022-09-05 01:24:24,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt. [default2]:[2022-09-05 01:24:24,731] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt [default6]:[2022-09-05 01:24:24,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt. [default6]:[2022-09-05 01:24:24,824] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt [default4]:[2022-09-05 01:24:24,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt. [default4]:[2022-09-05 01:24:24,843] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt [default3]:[2022-09-05 01:24:24,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt. [default3]:[2022-09-05 01:24:24,774] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt [default6]:[2022-09-05 01:24:24,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt. [default6]:[2022-09-05 01:24:24,864] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt [default6]:[2022-09-05 01:24:24,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt. [default6]:[2022-09-05 01:24:24,788] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt [default2]:[2022-09-05 01:24:24,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt. [default2]:[2022-09-05 01:24:24,836] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt [default4]:[2022-09-05 01:24:24,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt. [default4]:[2022-09-05 01:24:24,821] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt [default4]:[2022-09-05 01:24:24,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt. [default4]:[2022-09-05 01:24:24,828] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt [default2]:[2022-09-05 01:24:24,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt. [default2]:[2022-09-05 01:24:24,876] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt [default3]:[2022-09-05 01:24:24,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt. [default3]:[2022-09-05 01:24:24,904] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt [default1]:[2022-09-05 01:24:24,903] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt. [default1]:[2022-09-05 01:24:24,903] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt [default1]:[2022-09-05 01:24:24,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt. [default1]:[2022-09-05 01:24:24,946] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt [default0]:[2022-09-05 01:24:24,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt. [default0]:[2022-09-05 01:24:24,955] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt [default7]:[2022-09-05 01:24:24,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt. [default7]:[2022-09-05 01:24:24,972] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt [default5]:[2022-09-05 01:24:24,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt. [default5]:[2022-09-05 01:24:24,958] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt [default5]:[2022-09-05 01:24:24,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt. [default5]:[2022-09-05 01:24:24,976] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt [default4]:[2022-09-05 01:24:25,059] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt. [default4]:[2022-09-05 01:24:25,059] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt [default5]:[2022-09-05 01:24:25,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt. [default5]:[2022-09-05 01:24:25,037] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt [default3]:[2022-09-05 01:24:25,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt. [default3]:[2022-09-05 01:24:25,050] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt [default3]:[2022-09-05 01:24:25,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt. [default3]:[2022-09-05 01:24:25,094] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt [default3]:[2022-09-05 01:24:25,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt. [default3]:[2022-09-05 01:24:25,022] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt [default5]:[2022-09-05 01:24:25,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt. [default5]:[2022-09-05 01:24:25,049] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt [default0]:[2022-09-05 01:24:25,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt. [default0]:[2022-09-05 01:24:25,113] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt [default2]:[2022-09-05 01:24:25,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt. [default2]:[2022-09-05 01:24:25,049] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt [default1]:[2022-09-05 01:24:25,112] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt. [default1]:[2022-09-05 01:24:25,112] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt [default7]:[2022-09-05 01:24:25,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt. [default7]:[2022-09-05 01:24:25,107] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt [default2]:[2022-09-05 01:24:25,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt. [default2]:[2022-09-05 01:24:25,106] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt [default0]:[2022-09-05 01:24:25,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt. [default0]:[2022-09-05 01:24:25,103] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt [default7]:[2022-09-05 01:24:25,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt. [default7]:[2022-09-05 01:24:25,161] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt [default1]:[2022-09-05 01:24:25,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt. [default1]:[2022-09-05 01:24:25,203] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt [default2]:[2022-09-05 01:24:25,209] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt. [default2]:[2022-09-05 01:24:25,209] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt [default1]:[2022-09-05 01:24:25,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt. [default1]:[2022-09-05 01:24:25,207] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt [default2]:[2022-09-05 01:24:25,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt. [default2]:[2022-09-05 01:24:25,180] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt [default3]:[2022-09-05 01:24:25,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt. [default3]:[2022-09-05 01:24:25,215] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt [default1]:[2022-09-05 01:24:25,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt. [default1]:[2022-09-05 01:24:25,286] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt [default0]:[2022-09-05 01:24:25,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt. [default0]:[2022-09-05 01:24:25,271] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt [default2]:[2022-09-05 01:24:25,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt. [default2]:[2022-09-05 01:24:25,283] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt [default6]:[2022-09-05 01:24:25,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt. [default6]:[2022-09-05 01:24:25,302] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt [default1]:[2022-09-05 01:24:25,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt. [default1]:[2022-09-05 01:24:25,308] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt [default3]:[2022-09-05 01:24:25,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt. [default3]:[2022-09-05 01:24:25,334] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt [default1]:[2022-09-05 01:24:25,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt. [default1]:[2022-09-05 01:24:25,414] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt [default6]:[2022-09-05 01:24:25,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt. [default6]:[2022-09-05 01:24:25,464] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt [default6]:[2022-09-05 01:24:25,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt. [default6]:[2022-09-05 01:24:25,502] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt [default7]:[2022-09-05 01:24:25,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt. [default7]:[2022-09-05 01:24:25,535] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt [default6]:[2022-09-05 01:24:25,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt. [default6]:[2022-09-05 01:24:25,594] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt [default2]:[2022-09-05 01:24:25,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt. [default2]:[2022-09-05 01:24:25,553] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt [default6]:[2022-09-05 01:24:25,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt. [default6]:[2022-09-05 01:24:25,577] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt [default3]:[2022-09-05 01:24:25,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt. [default3]:[2022-09-05 01:24:25,644] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt [default1]:[2022-09-05 01:24:25,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt. [default1]:[2022-09-05 01:24:25,709] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt [default1]:[2022-09-05 01:24:25,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt. [default1]:[2022-09-05 01:24:25,752] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt [default2]:[2022-09-05 01:24:25,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt. [default2]:[2022-09-05 01:24:25,865] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt [default7]:[2022-09-05 01:24:25,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt. [default7]:[2022-09-05 01:24:25,793] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt [default1]:[2022-09-05 01:24:25,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt. [default1]:[2022-09-05 01:24:25,932] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt [default0]:[2022-09-05 01:24:25,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt. [default0]:[2022-09-05 01:24:25,931] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt [default3]:[2022-09-05 01:24:26,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt. [default3]:[2022-09-05 01:24:26,018] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt [default4]:[2022-09-05 01:24:25,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt. [default4]:[2022-09-05 01:24:25,932] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt [default1]:[2022-09-05 01:24:26,066] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt. [default1]:[2022-09-05 01:24:26,066] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt [default7]:[2022-09-05 01:24:26,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt. [default7]:[2022-09-05 01:24:26,181] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt [default2]:[2022-09-05 01:24:26,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt. [default2]:[2022-09-05 01:24:26,456] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt [default6]:[2022-09-05 01:24:26,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt. [default6]:[2022-09-05 01:24:26,406] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt [default7]:[2022-09-05 01:24:26,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt. [default7]:[2022-09-05 01:24:26,510] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt [default1]:[2022-09-05 01:24:26,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt. [default1]:[2022-09-05 01:24:26,630] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt [default4]:[2022-09-05 01:24:26,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt. [default4]:[2022-09-05 01:24:26,687] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt [default2]:[2022-09-05 01:24:26,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt. [default2]:[2022-09-05 01:24:26,741] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt [default7]:[2022-09-05 01:24:26,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt. [default7]:[2022-09-05 01:24:26,798] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt [default2]:[2022-09-05 01:24:26,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt. [default2]:[2022-09-05 01:24:26,834] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt [default0]:[2022-09-05 01:24:26,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt. [default0]:[2022-09-05 01:24:26,830] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt [default0]:[2022-09-05 01:24:26,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt. [default0]:[2022-09-05 01:24:26,891] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt [default0]:[2022-09-05 01:24:26,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt. [default0]:[2022-09-05 01:24:26,946] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt [default6]:[2022-09-05 01:24:26,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt. [default6]:[2022-09-05 01:24:26,924] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt [default1]:[2022-09-05 01:24:26,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt. [default1]:[2022-09-05 01:24:26,965] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt [default7]:[2022-09-05 01:24:26,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt. [default7]:[2022-09-05 01:24:26,947] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt [default5]:[2022-09-05 01:24:26,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt. [default5]:[2022-09-05 01:24:26,937] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt [default6]:[2022-09-05 01:24:26,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt. [default6]:[2022-09-05 01:24:26,964] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt [default7]:[2022-09-05 01:24:27,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt. [default7]:[2022-09-05 01:24:27,012] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt [default4]:[2022-09-05 01:24:26,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt. [default4]:[2022-09-05 01:24:26,949] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt [default1]:[2022-09-05 01:24:27,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt. [default1]:[2022-09-05 01:24:27,025] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt [default1]:[2022-09-05 01:24:27,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt. [default1]:[2022-09-05 01:24:27,042] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt [default4]:[2022-09-05 01:24:27,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt. [default4]:[2022-09-05 01:24:27,033] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt [default3]:[2022-09-05 01:24:27,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt. [default3]:[2022-09-05 01:24:27,036] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt [default5]:[2022-09-05 01:24:27,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt. [default5]:[2022-09-05 01:24:27,224] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt [default3]:[2022-09-05 01:24:27,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt. [default3]:[2022-09-05 01:24:27,199] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt [default0]:[2022-09-05 01:24:27,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt. [default0]:[2022-09-05 01:24:27,208] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt [default0]:[2022-09-05 01:24:27,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt. [default0]:[2022-09-05 01:24:27,239] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt [default6]:[2022-09-05 01:24:27,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt. [default6]:[2022-09-05 01:24:27,329] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt [default6]:[2022-09-05 01:24:27,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt. [default6]:[2022-09-05 01:24:27,337] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt [default3]:[2022-09-05 01:24:27,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt. [default3]:[2022-09-05 01:24:27,467] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt [default7]:[2022-09-05 01:24:27,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt. [default7]:[2022-09-05 01:24:27,507] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt [default4]:[2022-09-05 01:24:27,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt. [default4]:[2022-09-05 01:24:27,540] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt [default0]:[2022-09-05 01:24:27,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt. [default0]:[2022-09-05 01:24:27,632] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt [default1]:[2022-09-05 01:24:27,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt. [default1]:[2022-09-05 01:24:27,599] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt [default4]:[2022-09-05 01:24:27,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt. [default4]:[2022-09-05 01:24:27,775] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt [default6]:[2022-09-05 01:24:27,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt. [default6]:[2022-09-05 01:24:27,767] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt [default2]:[2022-09-05 01:24:27,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt. [default2]:[2022-09-05 01:24:27,843] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt [default5]:[2022-09-05 01:24:27,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt. [default5]:[2022-09-05 01:24:27,935] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt [default0]:[2022-09-05 01:24:27,996] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt. [default0]:[2022-09-05 01:24:27,996] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt [default0]:[2022-09-05 01:24:28,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt. [default0]:[2022-09-05 01:24:28,099] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt [default4]:[2022-09-05 01:24:28,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt. [default4]:[2022-09-05 01:24:28,076] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt [default4]:[2022-09-05 01:24:28,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt. [default4]:[2022-09-05 01:24:28,092] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt [default6]:[2022-09-05 01:24:28,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt. [default6]:[2022-09-05 01:24:28,232] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt [default2]:[2022-09-05 01:24:28,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt. [default2]:[2022-09-05 01:24:28,356] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt [default7]:[2022-09-05 01:24:28,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt. [default7]:[2022-09-05 01:24:28,376] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt [default5]:[2022-09-05 01:24:28,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt. [default5]:[2022-09-05 01:24:28,486] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt [default5]:[2022-09-05 01:24:28,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt. [default5]:[2022-09-05 01:24:28,438] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt [default7]:[2022-09-05 01:24:28,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt. [default7]:[2022-09-05 01:24:28,567] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt [default5]:[2022-09-05 01:24:28,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt. [default5]:[2022-09-05 01:24:28,597] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt [default1]:[2022-09-05 01:24:28,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt. [default1]:[2022-09-05 01:24:28,669] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt [default4]:[2022-09-05 01:24:28,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt. [default4]:[2022-09-05 01:24:28,681] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt [default4]:[2022-09-05 01:24:28,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt. [default4]:[2022-09-05 01:24:28,783] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt [default3]:[2022-09-05 01:24:29,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt. [default3]:[2022-09-05 01:24:29,113] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt [default5]:[2022-09-05 01:24:29,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt. [default5]:[2022-09-05 01:24:29,331] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt [default3]:[2022-09-05 01:24:29,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt. [default3]:[2022-09-05 01:24:29,339] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt [default7]:[2022-09-05 01:24:29,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt. [default7]:[2022-09-05 01:24:29,436] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt [default1]:[2022-09-05 01:24:29,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt. [default1]:[2022-09-05 01:24:29,586] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt [default0]:[2022-09-05 01:24:29,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt. [default0]:[2022-09-05 01:24:29,607] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt [default2]:[2022-09-05 01:24:29,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt. [default2]:[2022-09-05 01:24:29,583] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt [default0]:[2022-09-05 01:24:29,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt. [default0]:[2022-09-05 01:24:29,616] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt [default6]:[2022-09-05 01:24:29,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt. [default6]:[2022-09-05 01:24:29,641] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt [default3]:[2022-09-05 01:24:29,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt. [default3]:[2022-09-05 01:24:29,728] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt [default7]:[2022-09-05 01:24:29,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt. [default7]:[2022-09-05 01:24:29,945] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt [default7]:[2022-09-05 01:24:29,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt. [default7]:[2022-09-05 01:24:29,887] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt [default0]:[2022-09-05 01:24:30,301] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt. [default0]:[2022-09-05 01:24:30,301] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt [default1]:[2022-09-05 01:24:30,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt. [default1]:[2022-09-05 01:24:30,364] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt [default6]:[2022-09-05 01:24:30,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt. [default6]:[2022-09-05 01:24:30,371] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt [default0]:[2022-09-05 01:24:30,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt. [default0]:[2022-09-05 01:24:30,378] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt [default5]:[2022-09-05 01:24:30,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt. [default5]:[2022-09-05 01:24:30,647] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt [default7]:[2022-09-05 01:24:30,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt. [default7]:[2022-09-05 01:24:30,763] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt [default6]:[2022-09-05 01:24:30,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt. [default6]:[2022-09-05 01:24:30,803] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt [default4]:[2022-09-05 01:24:30,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt. [default4]:[2022-09-05 01:24:30,830] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt [default6]:[2022-09-05 01:24:31,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt. [default6]:[2022-09-05 01:24:31,184] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt [default0]:[2022-09-05 01:24:31,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt. [default0]:[2022-09-05 01:24:31,380] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt [default6]:[2022-09-05 01:24:31,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt. [default6]:[2022-09-05 01:24:31,424] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt [default5]:[2022-09-05 01:24:31,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt. [default5]:[2022-09-05 01:24:31,512] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt [default2]:[2022-09-05 01:24:31,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt. [default2]:[2022-09-05 01:24:31,627] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt [default2]:[2022-09-05 01:24:31,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt. [default2]:[2022-09-05 01:24:31,670] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt [default2]:[2022-09-05 01:24:31,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt. [default2]:[2022-09-05 01:24:31,769] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt [default6]:[2022-09-05 01:24:31,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt. [default6]:[2022-09-05 01:24:31,880] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt [default4]:[2022-09-05 01:24:31,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt. [default4]:[2022-09-05 01:24:31,894] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt [default7]:[2022-09-05 01:24:31,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt. [default7]:[2022-09-05 01:24:31,956] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt [default7]:[2022-09-05 01:24:32,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt. [default7]:[2022-09-05 01:24:32,451] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt [default3]:[2022-09-05 01:24:32,459] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt. [default3]:[2022-09-05 01:24:32,459] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt [default2]:[2022-09-05 01:24:32,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt. [default2]:[2022-09-05 01:24:32,615] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt [default3]:[2022-09-05 01:24:32,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt. [default3]:[2022-09-05 01:24:32,624] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt [default1]:[2022-09-05 01:24:32,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt. [default1]:[2022-09-05 01:24:32,641] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt [default0]:[2022-09-05 01:24:32,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt. [default0]:[2022-09-05 01:24:32,717] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt [default6]:[2022-09-05 01:24:32,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt. [default6]:[2022-09-05 01:24:32,845] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt [default7]:[2022-09-05 01:24:33,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt. [default7]:[2022-09-05 01:24:33,007] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt [default2]:[2022-09-05 01:24:33,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. [default2]:[2022-09-05 01:24:33,097] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt [default3]:[2022-09-05 01:24:33,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt. [default3]:[2022-09-05 01:24:33,258] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt [default4]:[2022-09-05 01:24:33,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt. [default4]:[2022-09-05 01:24:33,245] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt [default4]:[2022-09-05 01:24:33,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt. [default4]:[2022-09-05 01:24:33,454] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt [default5]:[2022-09-05 01:24:33,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt. [default5]:[2022-09-05 01:24:33,539] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt [default3]:[2022-09-05 01:24:33,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. [default3]:[2022-09-05 01:24:33,581] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt [default6]:[2022-09-05 01:24:33,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt. [default6]:[2022-09-05 01:24:33,692] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt [default7]:[2022-09-05 01:24:33,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt. [default7]:[2022-09-05 01:24:33,839] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt [default1]:[2022-09-05 01:24:33,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt. [default1]:[2022-09-05 01:24:33,953] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt [default6]:[2022-09-05 01:24:33,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt. [default6]:[2022-09-05 01:24:33,963] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt [default0]:[2022-09-05 01:24:34,064] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [default0]:[2022-09-05 01:24:34,066] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [default4]:[2022-09-05 01:24:34,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt. [default4]:[2022-09-05 01:24:34,474] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt [default2]:[2022-09-05 01:24:34,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt. [default2]:[2022-09-05 01:24:34,702] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt [default5]:[2022-09-05 01:24:34,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt. [default5]:[2022-09-05 01:24:34,695] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt [default1]:[2022-09-05 01:24:34,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. [default1]:[2022-09-05 01:24:34,760] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt [default3]:[2022-09-05 01:24:34,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt. [default3]:[2022-09-05 01:24:34,710] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt [default5]:[2022-09-05 01:24:34,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt. [default5]:[2022-09-05 01:24:34,822] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt [default1]:[2022-09-05 01:24:35,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt. [default1]:[2022-09-05 01:24:35,174] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt [default0]:[2022-09-05 01:24:35,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt. [default0]:[2022-09-05 01:24:35,228] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt [default7]:[2022-09-05 01:24:36,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt. [default7]:[2022-09-05 01:24:36,814] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt [default5]:[2022-09-05 01:24:37,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt. [default5]:[2022-09-05 01:24:37,592] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt [default4]:[2022-09-05 01:24:37,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt. [default4]:[2022-09-05 01:24:37,764] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt [default0]:[2022-09-05 01:24:37,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt. [default0]:[2022-09-05 01:24:37,803] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt [default6]:[2022-09-05 01:24:37,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt. [default6]:[2022-09-05 01:24:37,904] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt [default0]:[2022-09-05 01:24:37,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt. [default0]:[2022-09-05 01:24:37,957] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt [default3]:[2022-09-05 01:24:37,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt. [default3]:[2022-09-05 01:24:37,939] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt [default1]:[2022-09-05 01:24:37,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt. [default1]:[2022-09-05 01:24:37,951] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt [default2]:[2022-09-05 01:24:38,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt. [default2]:[2022-09-05 01:24:38,034] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt [default1]:[2022-09-05 01:24:41,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt. [default1]:[2022-09-05 01:24:41,450] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt [default7]:[2022-09-05 01:24:42,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt. [default7]:[2022-09-05 01:24:42,780] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt [default5]:[2022-09-05 01:24:43,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt. [default5]:[2022-09-05 01:24:43,589] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:time (ms) | save-checkpoint: 33917.66 [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]: successfully saved checkpoint at iteration 747 to /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt. [default4]:[2022-09-05 01:24:43,690] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step747/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default4]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default5]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default6]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default2]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default1]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default0]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default3]:[2022-09-05 01:24:43,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step747 is ready now! [default7]: iteration 748/ 3100 | consumed samples: 1531904 | consumed tokens: 3137339392 | elapsed time per iteration (s): 175.59 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.752483E-01 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 11.664 | TFLOPs: 119.07 | [default7]: iteration 749/ 3100 | consumed samples: 1533952 | consumed tokens: 3141533696 | elapsed time per iteration (s): 145.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.824050E-01 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.100 | TFLOPs: 143.94 | [default7]: iteration 750/ 3100 | consumed samples: 1536000 | consumed tokens: 3145728000 | elapsed time per iteration (s): 142.80 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.841483E-01 | grad norm: 27.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.342 | TFLOPs: 146.40 | [default7]:---------------------------------------------------------------------------------------------------------- [default7]:validation_pretraining loss at iteration 750 | lm loss value: 2.600752E+00 | lm loss PPL: 1.347386E+01 | [default7]:---------------------------------------------------------------------------------------------------------- [default7]:----------------------------------------------------------------------------------------- [default7]:valid loss at iteration 750 | lm loss value: 1.283630E+00 | lm loss PPL: 3.609719E+00 | [default7]:----------------------------------------------------------------------------------------- [default7]: iteration 751/ 3100 | consumed samples: 1538048 | consumed tokens: 3149922304 | elapsed time per iteration (s): 225.76 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.893092E-01 | grad norm: 0.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 9.072 | TFLOPs: 92.61 | [default7]: iteration 752/ 3100 | consumed samples: 1540096 | consumed tokens: 3154116608 | elapsed time per iteration (s): 142.56 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.752844E-01 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.366 | TFLOPs: 146.66 | [default7]: iteration 753/ 3100 | consumed samples: 1542144 | consumed tokens: 3158310912 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.796503E-01 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.01 | [default7]: iteration 754/ 3100 | consumed samples: 1544192 | consumed tokens: 3162505216 | elapsed time per iteration (s): 140.81 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.880199E-01 | grad norm: 0.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.545 | TFLOPs: 148.48 | [default7]: iteration 755/ 3100 | consumed samples: 1546240 | consumed tokens: 3166699520 | elapsed time per iteration (s): 143.09 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.788022E-01 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.313 | TFLOPs: 146.12 | [default7]: iteration 756/ 3100 | consumed samples: 1548288 | consumed tokens: 3170893824 | elapsed time per iteration (s): 141.21 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.807637E-01 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.503 | TFLOPs: 148.05 | [default7]: iteration 757/ 3100 | consumed samples: 1550336 | consumed tokens: 3175088128 | elapsed time per iteration (s): 142.77 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.768735E-01 | grad norm: 0.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.345 | TFLOPs: 146.44 | [default7]: iteration 758/ 3100 | consumed samples: 1552384 | consumed tokens: 3179282432 | elapsed time per iteration (s): 142.49 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.840815E-01 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.373 | TFLOPs: 146.73 | [default7]: iteration 759/ 3100 | consumed samples: 1554432 | consumed tokens: 3183476736 | elapsed time per iteration (s): 144.74 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.730002E-01 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.150 | TFLOPs: 144.45 | [default7]: iteration 760/ 3100 | consumed samples: 1556480 | consumed tokens: 3187671040 | elapsed time per iteration (s): 141.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.867783E-01 | grad norm: 0.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.502 | TFLOPs: 148.04 | [default7]: iteration 761/ 3100 | consumed samples: 1558528 | consumed tokens: 3191865344 | elapsed time per iteration (s): 141.10 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.718541E-01 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.515 | TFLOPs: 148.17 | [default7]: iteration 762/ 3100 | consumed samples: 1560576 | consumed tokens: 3196059648 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.840972E-01 | grad norm: 0.690 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.01 | [default7]: iteration 763/ 3100 | consumed samples: 1562624 | consumed tokens: 3200253952 | elapsed time per iteration (s): 142.65 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.705067E-01 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.356 | TFLOPs: 146.56 | [default7]: iteration 764/ 3100 | consumed samples: 1564672 | consumed tokens: 3204448256 | elapsed time per iteration (s): 141.22 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.805389E-01 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.502 | TFLOPs: 148.04 | [default7]: iteration 765/ 3100 | consumed samples: 1566720 | consumed tokens: 3208642560 | elapsed time per iteration (s): 142.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.742360E-01 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.380 | TFLOPs: 146.80 | [default7]: iteration 766/ 3100 | consumed samples: 1568768 | consumed tokens: 3212836864 | elapsed time per iteration (s): 141.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.811632E-01 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 767/ 3100 | consumed samples: 1570816 | consumed tokens: 3217031168 | elapsed time per iteration (s): 141.00 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.786386E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.525 | TFLOPs: 148.28 | [default7]: iteration 768/ 3100 | consumed samples: 1572864 | consumed tokens: 3221225472 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.802168E-01 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 769/ 3100 | consumed samples: 1574912 | consumed tokens: 3225419776 | elapsed time per iteration (s): 140.76 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.978237E-01 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.550 | TFLOPs: 148.53 | [default7]: iteration 770/ 3100 | consumed samples: 1576960 | consumed tokens: 3229614080 | elapsed time per iteration (s): 141.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.709838E-01 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 771/ 3100 | consumed samples: 1579008 | consumed tokens: 3233808384 | elapsed time per iteration (s): 142.31 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.797321E-01 | grad norm: 0.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.391 | TFLOPs: 146.91 | [default7]: iteration 772/ 3100 | consumed samples: 1581056 | consumed tokens: 3238002688 | elapsed time per iteration (s): 140.97 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.738207E-01 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.528 | TFLOPs: 148.31 | [default7]: iteration 773/ 3100 | consumed samples: 1583104 | consumed tokens: 3242196992 | elapsed time per iteration (s): 141.03 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.754848E-01 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.522 | TFLOPs: 148.24 | [default7]: iteration 774/ 3100 | consumed samples: 1585152 | consumed tokens: 3246391296 | elapsed time per iteration (s): 142.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.734551E-01 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.380 | TFLOPs: 146.79 | [default7]: iteration 775/ 3100 | consumed samples: 1587200 | consumed tokens: 3250585600 | elapsed time per iteration (s): 142.61 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.769363E-01 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.361 | TFLOPs: 146.61 | [default7]: iteration 776/ 3100 | consumed samples: 1589248 | consumed tokens: 3254779904 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.808728E-01 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.478 | TFLOPs: 147.80 | [default7]: iteration 777/ 3100 | consumed samples: 1591296 | consumed tokens: 3258974208 | elapsed time per iteration (s): 141.13 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.741124E-01 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.512 | TFLOPs: 148.14 | [default7]: iteration 778/ 3100 | consumed samples: 1593344 | consumed tokens: 3263168512 | elapsed time per iteration (s): 141.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.708086E-01 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.486 | TFLOPs: 147.88 | [default7]: iteration 779/ 3100 | consumed samples: 1595392 | consumed tokens: 3267362816 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.746393E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.01 | [default7]: iteration 780/ 3100 | consumed samples: 1597440 | consumed tokens: 3271557120 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.762144E-01 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.485 | TFLOPs: 147.87 | [default7]: iteration 781/ 3100 | consumed samples: 1599488 | consumed tokens: 3275751424 | elapsed time per iteration (s): 142.89 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.736778E-01 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.333 | TFLOPs: 146.32 | [default7]: iteration 782/ 3100 | consumed samples: 1601536 | consumed tokens: 3279945728 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.718484E-01 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 783/ 3100 | consumed samples: 1603584 | consumed tokens: 3284140032 | elapsed time per iteration (s): 142.12 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.795088E-01 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.410 | TFLOPs: 147.10 | [default7]: iteration 784/ 3100 | consumed samples: 1605632 | consumed tokens: 3288334336 | elapsed time per iteration (s): 141.47 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.682607E-01 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.476 | TFLOPs: 147.78 | [default7]: iteration 785/ 3100 | consumed samples: 1607680 | consumed tokens: 3292528640 | elapsed time per iteration (s): 142.11 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.731088E-01 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.411 | TFLOPs: 147.12 | [default7]: iteration 786/ 3100 | consumed samples: 1609728 | consumed tokens: 3296722944 | elapsed time per iteration (s): 141.36 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.732409E-01 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.488 | TFLOPs: 147.90 | [default7]: iteration 787/ 3100 | consumed samples: 1611776 | consumed tokens: 3300917248 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.690096E-01 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.98 | [default7]: iteration 788/ 3100 | consumed samples: 1613824 | consumed tokens: 3305111552 | elapsed time per iteration (s): 141.24 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.629743E-01 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.500 | TFLOPs: 148.02 | [default7]: iteration 789/ 3100 | consumed samples: 1615872 | consumed tokens: 3309305856 | elapsed time per iteration (s): 141.98 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.733644E-01 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.425 | TFLOPs: 147.25 | [default7]: iteration 790/ 3100 | consumed samples: 1617920 | consumed tokens: 3313500160 | elapsed time per iteration (s): 141.07 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.824841E-01 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.518 | TFLOPs: 148.20 | [default7]: iteration 791/ 3100 | consumed samples: 1619968 | consumed tokens: 3317694464 | elapsed time per iteration (s): 141.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.676941E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.492 | TFLOPs: 147.94 | [default7]: iteration 792/ 3100 | consumed samples: 1622016 | consumed tokens: 3321888768 | elapsed time per iteration (s): 142.89 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.631176E-01 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.333 | TFLOPs: 146.32 | [default7]: iteration 793/ 3100 | consumed samples: 1624064 | consumed tokens: 3326083072 | elapsed time per iteration (s): 142.97 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.660194E-01 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.325 | TFLOPs: 146.24 | [default7]: iteration 794/ 3100 | consumed samples: 1626112 | consumed tokens: 3330277376 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.667649E-01 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.485 | TFLOPs: 147.87 | [default7]: iteration 795/ 3100 | consumed samples: 1628160 | consumed tokens: 3334471680 | elapsed time per iteration (s): 142.80 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.625572E-01 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.342 | TFLOPs: 146.41 | [default7]: iteration 796/ 3100 | consumed samples: 1630208 | consumed tokens: 3338665984 | elapsed time per iteration (s): 140.87 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.783923E-01 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.538 | TFLOPs: 148.41 | [default7]: iteration 797/ 3100 | consumed samples: 1632256 | consumed tokens: 3342860288 | elapsed time per iteration (s): 141.80 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.737855E-01 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.443 | TFLOPs: 147.44 | [default7]: iteration 798/ 3100 | consumed samples: 1634304 | consumed tokens: 3347054592 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.608361E-01 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.74 | [default7]: iteration 799/ 3100 | consumed samples: 1636352 | consumed tokens: 3351248896 | elapsed time per iteration (s): 141.17 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.694084E-01 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.507 | TFLOPs: 148.09 | [default7]: iteration 800/ 3100 | consumed samples: 1638400 | consumed tokens: 3355443200 | elapsed time per iteration (s): 141.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.647535E-01 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.484 | TFLOPs: 147.86 | [default7]: iteration 801/ 3100 | consumed samples: 1640448 | consumed tokens: 3359637504 | elapsed time per iteration (s): 141.29 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.649920E-01 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.495 | TFLOPs: 147.98 | [default7]: iteration 802/ 3100 | consumed samples: 1642496 | consumed tokens: 3363831808 | elapsed time per iteration (s): 142.49 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.594289E-01 | grad norm: 0.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.372 | TFLOPs: 146.72 | [default7]: iteration 803/ 3100 | consumed samples: 1644544 | consumed tokens: 3368026112 | elapsed time per iteration (s): 141.70 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.650746E-01 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.453 | TFLOPs: 147.54 | [default7]: iteration 804/ 3100 | consumed samples: 1646592 | consumed tokens: 3372220416 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.585262E-01 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.74 | [default7]: iteration 805/ 3100 | consumed samples: 1648640 | consumed tokens: 3376414720 | elapsed time per iteration (s): 141.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.654656E-01 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.435 | TFLOPs: 147.36 | [default7]: iteration 806/ 3100 | consumed samples: 1650688 | consumed tokens: 3380609024 | elapsed time per iteration (s): 141.73 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.609366E-01 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.450 | TFLOPs: 147.52 | [default7]: iteration 807/ 3100 | consumed samples: 1652736 | consumed tokens: 3384803328 | elapsed time per iteration (s): 141.61 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.652391E-01 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.463 | TFLOPs: 147.64 | [default7]: iteration 808/ 3100 | consumed samples: 1654784 | consumed tokens: 3388997632 | elapsed time per iteration (s): 142.69 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.660564E-01 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.353 | TFLOPs: 146.52 | [default7]: iteration 809/ 3100 | consumed samples: 1656832 | consumed tokens: 3393191936 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.674340E-01 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.477 | TFLOPs: 147.79 | [default7]: iteration 810/ 3100 | consumed samples: 1658880 | consumed tokens: 3397386240 | elapsed time per iteration (s): 141.11 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.676868E-01 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.514 | TFLOPs: 148.16 | [default7]: iteration 811/ 3100 | consumed samples: 1660928 | consumed tokens: 3401580544 | elapsed time per iteration (s): 141.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.574076E-01 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 812/ 3100 | consumed samples: 1662976 | consumed tokens: 3405774848 | elapsed time per iteration (s): 141.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.643032E-01 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.497 | TFLOPs: 147.99 | [default7]: iteration 813/ 3100 | consumed samples: 1665024 | consumed tokens: 3409969152 | elapsed time per iteration (s): 141.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.682949E-01 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.489 | TFLOPs: 147.91 | [default7]: iteration 814/ 3100 | consumed samples: 1667072 | consumed tokens: 3414163456 | elapsed time per iteration (s): 142.77 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.646919E-01 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.345 | TFLOPs: 146.44 | [default7]: iteration 815/ 3100 | consumed samples: 1669120 | consumed tokens: 3418357760 | elapsed time per iteration (s): 143.09 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.626214E-01 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.313 | TFLOPs: 146.11 | [default7]: iteration 816/ 3100 | consumed samples: 1671168 | consumed tokens: 3422552064 | elapsed time per iteration (s): 141.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.555832E-01 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.501 | TFLOPs: 148.04 | [default7]: iteration 817/ 3100 | consumed samples: 1673216 | consumed tokens: 3426746368 | elapsed time per iteration (s): 141.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.684150E-01 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.487 | TFLOPs: 147.89 | [default7]: iteration 818/ 3100 | consumed samples: 1675264 | consumed tokens: 3430940672 | elapsed time per iteration (s): 142.05 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.601861E-01 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.417 | TFLOPs: 147.17 | [default7]: iteration 819/ 3100 | consumed samples: 1677312 | consumed tokens: 3435134976 | elapsed time per iteration (s): 142.81 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.632024E-01 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.340 | TFLOPs: 146.39 | [default7]: iteration 820/ 3100 | consumed samples: 1679360 | consumed tokens: 3439329280 | elapsed time per iteration (s): 141.04 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.630506E-01 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.521 | TFLOPs: 148.23 | [default7]: iteration 821/ 3100 | consumed samples: 1681408 | consumed tokens: 3443523584 | elapsed time per iteration (s): 142.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.623487E-01 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.379 | TFLOPs: 146.79 | [default7]: iteration 822/ 3100 | consumed samples: 1683456 | consumed tokens: 3447717888 | elapsed time per iteration (s): 142.92 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.685959E-01 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.329 | TFLOPs: 146.28 | [default7]: iteration 823/ 3100 | consumed samples: 1685504 | consumed tokens: 3451912192 | elapsed time per iteration (s): 142.78 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.555161E-01 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.343 | TFLOPs: 146.42 | [default7]: iteration 824/ 3100 | consumed samples: 1687552 | consumed tokens: 3456106496 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.595163E-01 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.478 | TFLOPs: 147.79 | [default7]: iteration 825/ 3100 | consumed samples: 1689600 | consumed tokens: 3460300800 | elapsed time per iteration (s): 142.67 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.588449E-01 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.354 | TFLOPs: 146.54 | [default7]: iteration 826/ 3100 | consumed samples: 1691648 | consumed tokens: 3464495104 | elapsed time per iteration (s): 141.55 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.480747E-01 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.468 | TFLOPs: 147.70 | [default7]: iteration 827/ 3100 | consumed samples: 1693696 | consumed tokens: 3468689408 | elapsed time per iteration (s): 140.99 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.556676E-01 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.526 | TFLOPs: 148.28 | [default7]: iteration 828/ 3100 | consumed samples: 1695744 | consumed tokens: 3472883712 | elapsed time per iteration (s): 141.20 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.638841E-01 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.504 | TFLOPs: 148.06 | [default7]: iteration 829/ 3100 | consumed samples: 1697792 | consumed tokens: 3477078016 | elapsed time per iteration (s): 141.54 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.620458E-01 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.469 | TFLOPs: 147.71 | [default7]: iteration 830/ 3100 | consumed samples: 1699840 | consumed tokens: 3481272320 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.588673E-01 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.01 | [default7]: iteration 831/ 3100 | consumed samples: 1701888 | consumed tokens: 3485466624 | elapsed time per iteration (s): 142.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.701776E-01 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.397 | TFLOPs: 146.97 | [default7]: iteration 832/ 3100 | consumed samples: 1703936 | consumed tokens: 3489660928 | elapsed time per iteration (s): 141.15 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.590308E-01 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.509 | TFLOPs: 148.12 | [default7]: iteration 833/ 3100 | consumed samples: 1705984 | consumed tokens: 3493855232 | elapsed time per iteration (s): 141.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.555193E-01 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 834/ 3100 | consumed samples: 1708032 | consumed tokens: 3498049536 | elapsed time per iteration (s): 140.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.556335E-01 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.537 | TFLOPs: 148.40 | [default7]: iteration 835/ 3100 | consumed samples: 1710080 | consumed tokens: 3502243840 | elapsed time per iteration (s): 141.72 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.640954E-01 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.451 | TFLOPs: 147.52 | [default7]: iteration 836/ 3100 | consumed samples: 1712128 | consumed tokens: 3506438144 | elapsed time per iteration (s): 142.19 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.587322E-01 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.403 | TFLOPs: 147.03 | [default7]: iteration 837/ 3100 | consumed samples: 1714176 | consumed tokens: 3510632448 | elapsed time per iteration (s): 142.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.585517E-01 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.370 | TFLOPs: 146.70 | [default7]: iteration 838/ 3100 | consumed samples: 1716224 | consumed tokens: 3514826752 | elapsed time per iteration (s): 141.20 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.624009E-01 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.505 | TFLOPs: 148.07 | [default7]: iteration 839/ 3100 | consumed samples: 1718272 | consumed tokens: 3519021056 | elapsed time per iteration (s): 141.44 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.650039E-01 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.82 | [default7]: iteration 840/ 3100 | consumed samples: 1720320 | consumed tokens: 3523215360 | elapsed time per iteration (s): 142.63 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.596187E-01 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.359 | TFLOPs: 146.58 | [default7]: iteration 841/ 3100 | consumed samples: 1722368 | consumed tokens: 3527409664 | elapsed time per iteration (s): 141.49 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.579563E-01 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.474 | TFLOPs: 147.76 | [default7]: iteration 842/ 3100 | consumed samples: 1724416 | consumed tokens: 3531603968 | elapsed time per iteration (s): 142.76 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.602415E-01 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.346 | TFLOPs: 146.45 | [default7]: iteration 843/ 3100 | consumed samples: 1726464 | consumed tokens: 3535798272 | elapsed time per iteration (s): 142.90 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.585562E-01 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.332 | TFLOPs: 146.30 | [default7]: iteration 844/ 3100 | consumed samples: 1728512 | consumed tokens: 3539992576 | elapsed time per iteration (s): 142.13 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.606657E-01 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.409 | TFLOPs: 147.10 | [default7]: iteration 845/ 3100 | consumed samples: 1730560 | consumed tokens: 3544186880 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.534143E-01 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.473 | TFLOPs: 147.75 | [default7]: iteration 846/ 3100 | consumed samples: 1732608 | consumed tokens: 3548381184 | elapsed time per iteration (s): 141.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.538564E-01 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.487 | TFLOPs: 147.89 | [default7]: iteration 847/ 3100 | consumed samples: 1734656 | consumed tokens: 3552575488 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.526292E-01 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 848/ 3100 | consumed samples: 1736704 | consumed tokens: 3556769792 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.521759E-01 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.82 | [default7]: iteration 849/ 3100 | consumed samples: 1738752 | consumed tokens: 3560964096 | elapsed time per iteration (s): 141.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.492053E-01 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.483 | TFLOPs: 147.85 | [default7]: iteration 850/ 3100 | consumed samples: 1740800 | consumed tokens: 3565158400 | elapsed time per iteration (s): 142.64 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.547689E-01 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.358 | TFLOPs: 146.57 | [default7]: iteration 851/ 3100 | consumed samples: 1742848 | consumed tokens: 3569352704 | elapsed time per iteration (s): 141.50 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.413431E-01 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.474 | TFLOPs: 147.75 | [default7]: iteration 852/ 3100 | consumed samples: 1744896 | consumed tokens: 3573547008 | elapsed time per iteration (s): 142.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.514139E-01 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.381 | TFLOPs: 146.81 | [default7]: iteration 853/ 3100 | consumed samples: 1746944 | consumed tokens: 3577741312 | elapsed time per iteration (s): 141.31 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.582590E-01 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.493 | TFLOPs: 147.95 | [default7]: iteration 854/ 3100 | consumed samples: 1748992 | consumed tokens: 3581935616 | elapsed time per iteration (s): 143.04 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.542514E-01 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.318 | TFLOPs: 146.16 | [default7]: iteration 855/ 3100 | consumed samples: 1751040 | consumed tokens: 3586129920 | elapsed time per iteration (s): 141.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.616976E-01 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.497 | TFLOPs: 147.99 | [default7]: iteration 856/ 3100 | consumed samples: 1753088 | consumed tokens: 3590324224 | elapsed time per iteration (s): 142.30 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.474802E-01 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.392 | TFLOPs: 146.92 | [default7]: iteration 857/ 3100 | consumed samples: 1755136 | consumed tokens: 3594518528 | elapsed time per iteration (s): 142.85 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.493602E-01 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.336 | TFLOPs: 146.35 | [default7]: iteration 858/ 3100 | consumed samples: 1757184 | consumed tokens: 3598712832 | elapsed time per iteration (s): 142.53 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.543900E-01 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.369 | TFLOPs: 146.68 | [default7]: iteration 859/ 3100 | consumed samples: 1759232 | consumed tokens: 3602907136 | elapsed time per iteration (s): 141.29 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.478656E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.495 | TFLOPs: 147.97 | [default7]: iteration 860/ 3100 | consumed samples: 1761280 | consumed tokens: 3607101440 | elapsed time per iteration (s): 142.75 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.533274E-01 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.347 | TFLOPs: 146.46 | [default7]: iteration 861/ 3100 | consumed samples: 1763328 | consumed tokens: 3611295744 | elapsed time per iteration (s): 141.73 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.516811E-01 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.450 | TFLOPs: 147.51 | [default7]: iteration 862/ 3100 | consumed samples: 1765376 | consumed tokens: 3615490048 | elapsed time per iteration (s): 141.54 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.623091E-01 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.469 | TFLOPs: 147.71 | [default7]: iteration 863/ 3100 | consumed samples: 1767424 | consumed tokens: 3619684352 | elapsed time per iteration (s): 141.21 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.424619E-01 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.503 | TFLOPs: 148.06 | [default7]: iteration 864/ 3100 | consumed samples: 1769472 | consumed tokens: 3623878656 | elapsed time per iteration (s): 141.63 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.491994E-01 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.460 | TFLOPs: 147.61 | [default7]: iteration 865/ 3100 | consumed samples: 1771520 | consumed tokens: 3628072960 | elapsed time per iteration (s): 140.79 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.558689E-01 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.546 | TFLOPs: 148.50 | [default7]: iteration 866/ 3100 | consumed samples: 1773568 | consumed tokens: 3632267264 | elapsed time per iteration (s): 142.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.446196E-01 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.383 | TFLOPs: 146.83 | [default7]: iteration 867/ 3100 | consumed samples: 1775616 | consumed tokens: 3636461568 | elapsed time per iteration (s): 141.57 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.452459E-01 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.467 | TFLOPs: 147.68 | [default7]: iteration 868/ 3100 | consumed samples: 1777664 | consumed tokens: 3640655872 | elapsed time per iteration (s): 141.77 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.518500E-01 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.446 | TFLOPs: 147.48 | [default7]: iteration 869/ 3100 | consumed samples: 1779712 | consumed tokens: 3644850176 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.514950E-01 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.98 | [default7]: iteration 870/ 3100 | consumed samples: 1781760 | consumed tokens: 3649044480 | elapsed time per iteration (s): 142.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.467404E-01 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.377 | TFLOPs: 146.76 | [default7]: iteration 871/ 3100 | consumed samples: 1783808 | consumed tokens: 3653238784 | elapsed time per iteration (s): 142.67 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.433201E-01 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.355 | TFLOPs: 146.54 | [default7]: iteration 872/ 3100 | consumed samples: 1785856 | consumed tokens: 3657433088 | elapsed time per iteration (s): 141.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.543152E-01 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 873/ 3100 | consumed samples: 1787904 | consumed tokens: 3661627392 | elapsed time per iteration (s): 141.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.473446E-01 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 874/ 3100 | consumed samples: 1789952 | consumed tokens: 3665821696 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.497150E-01 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.485 | TFLOPs: 147.87 | [default7]: iteration 875/ 3100 | consumed samples: 1792000 | consumed tokens: 3670016000 | elapsed time per iteration (s): 142.82 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.446318E-01 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.339 | TFLOPs: 146.38 | [default7]: iteration 876/ 3100 | consumed samples: 1794048 | consumed tokens: 3674210304 | elapsed time per iteration (s): 142.58 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.521231E-01 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.364 | TFLOPs: 146.63 | [default7]: iteration 877/ 3100 | consumed samples: 1796096 | consumed tokens: 3678404608 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.460010E-01 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.485 | TFLOPs: 147.87 | [default7]: iteration 878/ 3100 | consumed samples: 1798144 | consumed tokens: 3682598912 | elapsed time per iteration (s): 142.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.512410E-01 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.380 | TFLOPs: 146.80 | [default7]: iteration 879/ 3100 | consumed samples: 1800192 | consumed tokens: 3686793216 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.459820E-01 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.98 | [default7]: iteration 880/ 3100 | consumed samples: 1802240 | consumed tokens: 3690987520 | elapsed time per iteration (s): 141.90 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.484347E-01 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.433 | TFLOPs: 147.34 | [default7]: iteration 881/ 3100 | consumed samples: 1804288 | consumed tokens: 3695181824 | elapsed time per iteration (s): 141.09 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.450066E-01 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.516 | TFLOPs: 148.18 | [default7]: iteration 882/ 3100 | consumed samples: 1806336 | consumed tokens: 3699376128 | elapsed time per iteration (s): 141.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.422710E-01 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.479 | TFLOPs: 147.81 | [default7]: iteration 883/ 3100 | consumed samples: 1808384 | consumed tokens: 3703570432 | elapsed time per iteration (s): 142.49 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.436232E-01 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.373 | TFLOPs: 146.72 | [default7]: iteration 884/ 3100 | consumed samples: 1810432 | consumed tokens: 3707764736 | elapsed time per iteration (s): 141.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.473047E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.485 | TFLOPs: 147.87 | [default7]: iteration 885/ 3100 | consumed samples: 1812480 | consumed tokens: 3711959040 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.401107E-01 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.477 | TFLOPs: 147.79 | [default7]: iteration 886/ 3100 | consumed samples: 1814528 | consumed tokens: 3716153344 | elapsed time per iteration (s): 142.77 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.502357E-01 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.345 | TFLOPs: 146.44 | [default7]: iteration 887/ 3100 | consumed samples: 1816576 | consumed tokens: 3720347648 | elapsed time per iteration (s): 142.84 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.470721E-01 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.338 | TFLOPs: 146.36 | [default7]: iteration 888/ 3100 | consumed samples: 1818624 | consumed tokens: 3724541952 | elapsed time per iteration (s): 141.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.531379E-01 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.483 | TFLOPs: 147.85 | [default7]: iteration 889/ 3100 | consumed samples: 1820672 | consumed tokens: 3728736256 | elapsed time per iteration (s): 142.50 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.486525E-01 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.372 | TFLOPs: 146.72 | [default7]: iteration 890/ 3100 | consumed samples: 1822720 | consumed tokens: 3732930560 | elapsed time per iteration (s): 141.49 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.477930E-01 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.475 | TFLOPs: 147.77 | [default7]: iteration 891/ 3100 | consumed samples: 1824768 | consumed tokens: 3737124864 | elapsed time per iteration (s): 141.02 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.440043E-01 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.522 | TFLOPs: 148.25 | [default7]: iteration 892/ 3100 | consumed samples: 1826816 | consumed tokens: 3741319168 | elapsed time per iteration (s): 141.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.480964E-01 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.501 | TFLOPs: 148.03 | [default7]: iteration 893/ 3100 | consumed samples: 1828864 | consumed tokens: 3745513472 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.464991E-01 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 894/ 3100 | consumed samples: 1830912 | consumed tokens: 3749707776 | elapsed time per iteration (s): 140.63 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.460266E-01 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.563 | TFLOPs: 148.67 | [default7]: iteration 895/ 3100 | consumed samples: 1832960 | consumed tokens: 3753902080 | elapsed time per iteration (s): 141.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.486554E-01 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.484 | TFLOPs: 147.86 | [default7]: iteration 896/ 3100 | consumed samples: 1835008 | consumed tokens: 3758096384 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.462060E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.481 | TFLOPs: 147.82 | [default7]: iteration 897/ 3100 | consumed samples: 1837056 | consumed tokens: 3762290688 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.432891E-01 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.485 | TFLOPs: 147.87 | [default7]: iteration 898/ 3100 | consumed samples: 1839104 | consumed tokens: 3766484992 | elapsed time per iteration (s): 143.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.435425E-01 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.299 | TFLOPs: 145.97 | [default7]: iteration 899/ 3100 | consumed samples: 1841152 | consumed tokens: 3770679296 | elapsed time per iteration (s): 142.01 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.412012E-01 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.422 | TFLOPs: 147.23 | [default7]: iteration 900/ 3100 | consumed samples: 1843200 | consumed tokens: 3774873600 | elapsed time per iteration (s): 141.34 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.451383E-01 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.490 | TFLOPs: 147.92 | [default7]: iteration 901/ 3100 | consumed samples: 1845248 | consumed tokens: 3779067904 | elapsed time per iteration (s): 141.66 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.418550E-01 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.457 | TFLOPs: 147.59 | [default7]: iteration 902/ 3100 | consumed samples: 1847296 | consumed tokens: 3783262208 | elapsed time per iteration (s): 141.22 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.365264E-01 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.502 | TFLOPs: 148.05 | [default7]: iteration 903/ 3100 | consumed samples: 1849344 | consumed tokens: 3787456512 | elapsed time per iteration (s): 141.47 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.444424E-01 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.477 | TFLOPs: 147.79 | [default7]: iteration 904/ 3100 | consumed samples: 1851392 | consumed tokens: 3791650816 | elapsed time per iteration (s): 142.31 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.460082E-01 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.391 | TFLOPs: 146.91 | [default7]: iteration 905/ 3100 | consumed samples: 1853440 | consumed tokens: 3795845120 | elapsed time per iteration (s): 141.86 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.403237E-01 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.437 | TFLOPs: 147.38 | [default7]: iteration 906/ 3100 | consumed samples: 1855488 | consumed tokens: 3800039424 | elapsed time per iteration (s): 141.36 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.333321E-01 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.488 | TFLOPs: 147.90 | [default7]: iteration 907/ 3100 | consumed samples: 1857536 | consumed tokens: 3804233728 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.440751E-01 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.82 | [default7]: iteration 908/ 3100 | consumed samples: 1859584 | consumed tokens: 3808428032 | elapsed time per iteration (s): 141.16 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.445868E-01 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.509 | TFLOPs: 148.11 | [default7]: iteration 909/ 3100 | consumed samples: 1861632 | consumed tokens: 3812622336 | elapsed time per iteration (s): 141.52 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.402400E-01 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.74 | [default7]: iteration 910/ 3100 | consumed samples: 1863680 | consumed tokens: 3816816640 | elapsed time per iteration (s): 141.38 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.327716E-01 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.485 | TFLOPs: 147.87 | [default7]: iteration 911/ 3100 | consumed samples: 1865728 | consumed tokens: 3821010944 | elapsed time per iteration (s): 141.59 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.386177E-01 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.465 | TFLOPs: 147.66 | [default7]: iteration 912/ 3100 | consumed samples: 1867776 | consumed tokens: 3825205248 | elapsed time per iteration (s): 141.92 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.409765E-01 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.430 | TFLOPs: 147.31 | [default7]: iteration 913/ 3100 | consumed samples: 1869824 | consumed tokens: 3829399552 | elapsed time per iteration (s): 141.92 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.400348E-01 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.431 | TFLOPs: 147.32 | [default7]: iteration 914/ 3100 | consumed samples: 1871872 | consumed tokens: 3833593856 | elapsed time per iteration (s): 142.86 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.444111E-01 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.336 | TFLOPs: 146.34 | [default7]: iteration 915/ 3100 | consumed samples: 1873920 | consumed tokens: 3837788160 | elapsed time per iteration (s): 141.31 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.345719E-01 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.493 | TFLOPs: 147.95 | WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam06-ib0_3640068_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam13-ib0_1968793_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam52-ib0_1786170_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam07-ib0_3962821_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam44-ib0_1588191_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam11-ib0_1979443_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam02-ib0_3644008_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam32-ib0_520731_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam08-ib0_2939187_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam30-ib0_3600586_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam39-ib0_1381719_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam45-ib0_416062_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam37-ib0_3160801_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam05-ib0_3028198_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam47-ib0_935604_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam46-ib0_3922277_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam28-ib0_3616042_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam34-ib0_1722514_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam18-ib0_2645373_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam14-ib0_2235844_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam04-ib0_1988464_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam41-ib0_2678031_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam38-ib0_3792286_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam03-ib0_1898990_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam36-ib0_1808309_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam09-ib0_2024868_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam42-ib0_3048317_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam33-ib0_378391_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam15-ib0_2143754_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam26-ib0_428012_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam31-ib0_521616_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam27-ib0_256523_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam40-ib0_1327371_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam19-ib0_1450811_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'jean-zay-iam35-ib0_1560017_0' has failed to send a keep-alive heartbeat to the rendezvous 'none' due to an error of type RendezvousTimeoutError. [default7]: iteration 916/ 3100 | consumed samples: 1875968 | consumed tokens: 3841982464 | elapsed time per iteration (s): 145.09 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.375472E-01 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.115 | TFLOPs: 144.09 | [default7]: iteration 917/ 3100 | consumed samples: 1878016 | consumed tokens: 3846176768 | elapsed time per iteration (s): 141.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.335149E-01 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.492 | TFLOPs: 147.94 | [default7]: iteration 918/ 3100 | consumed samples: 1880064 | consumed tokens: 3850371072 | elapsed time per iteration (s): 142.11 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.355835E-01 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.411 | TFLOPs: 147.12 | [default7]: iteration 919/ 3100 | consumed samples: 1882112 | consumed tokens: 3854565376 | elapsed time per iteration (s): 141.54 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.373458E-01 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.469 | TFLOPs: 147.71 | [default7]: iteration 920/ 3100 | consumed samples: 1884160 | consumed tokens: 3858759680 | elapsed time per iteration (s): 143.06 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.281558E-01 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.316 | TFLOPs: 146.14 | [default7]: iteration 921/ 3100 | consumed samples: 1886208 | consumed tokens: 3862953984 | elapsed time per iteration (s): 141.88 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.392529E-01 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.435 | TFLOPs: 147.36 | [default7]: iteration 922/ 3100 | consumed samples: 1888256 | consumed tokens: 3867148288 | elapsed time per iteration (s): 141.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.417534E-01 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.489 | TFLOPs: 147.91 | [default7]: iteration 923/ 3100 | consumed samples: 1890304 | consumed tokens: 3871342592 | elapsed time per iteration (s): 141.15 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.325736E-01 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.509 | TFLOPs: 148.12 | [default7]: iteration 924/ 3100 | consumed samples: 1892352 | consumed tokens: 3875536896 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.401465E-01 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.82 | [default7]: iteration 925/ 3100 | consumed samples: 1894400 | consumed tokens: 3879731200 | elapsed time per iteration (s): 141.10 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.417886E-01 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.514 | TFLOPs: 148.17 | [default7]: iteration 926/ 3100 | consumed samples: 1896448 | consumed tokens: 3883925504 | elapsed time per iteration (s): 142.06 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.406285E-01 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.416 | TFLOPs: 147.17 | [default7]: iteration 927/ 3100 | consumed samples: 1898496 | consumed tokens: 3888119808 | elapsed time per iteration (s): 141.96 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.418576E-01 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.426 | TFLOPs: 147.27 | [default7]: iteration 928/ 3100 | consumed samples: 1900544 | consumed tokens: 3892314112 | elapsed time per iteration (s): 141.72 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.258060E-01 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.451 | TFLOPs: 147.52 | [default7]: iteration 929/ 3100 | consumed samples: 1902592 | consumed tokens: 3896508416 | elapsed time per iteration (s): 141.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.341617E-01 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.501 | TFLOPs: 148.03 | [default7]: iteration 930/ 3100 | consumed samples: 1904640 | consumed tokens: 3900702720 | elapsed time per iteration (s): 141.30 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.351350E-01 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.494 | TFLOPs: 147.96 | [default7]: iteration 931/ 3100 | consumed samples: 1906688 | consumed tokens: 3904897024 | elapsed time per iteration (s): 141.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.284452E-01 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.478 | TFLOPs: 147.80 | [default7]: iteration 932/ 3100 | consumed samples: 1908736 | consumed tokens: 3909091328 | elapsed time per iteration (s): 141.48 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.287497E-01 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.475 | TFLOPs: 147.77 | [default7]: iteration 933/ 3100 | consumed samples: 1910784 | consumed tokens: 3913285632 | elapsed time per iteration (s): 142.01 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.286189E-01 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.421 | TFLOPs: 147.22 | [default7]: iteration 934/ 3100 | consumed samples: 1912832 | consumed tokens: 3917479936 | elapsed time per iteration (s): 142.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.285094E-01 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.394 | TFLOPs: 146.94 | [default7]: iteration 935/ 3100 | consumed samples: 1914880 | consumed tokens: 3921674240 | elapsed time per iteration (s): 142.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.373519E-01 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.400 | TFLOPs: 147.00 | [default7]: iteration 936/ 3100 | consumed samples: 1916928 | consumed tokens: 3925868544 | elapsed time per iteration (s): 141.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.313629E-01 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.489 | TFLOPs: 147.91 | [default7]: iteration 937/ 3100 | consumed samples: 1918976 | consumed tokens: 3930062848 | elapsed time per iteration (s): 142.20 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.314416E-01 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.402 | TFLOPs: 147.02 | [default7]: iteration 938/ 3100 | consumed samples: 1921024 | consumed tokens: 3934257152 | elapsed time per iteration (s): 141.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.449638E-01 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.484 | TFLOPs: 147.86 | [default7]: iteration 939/ 3100 | consumed samples: 1923072 | consumed tokens: 3938451456 | elapsed time per iteration (s): 142.61 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.401199E-01 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.361 | TFLOPs: 146.60 | [default7]: iteration 940/ 3100 | consumed samples: 1925120 | consumed tokens: 3942645760 | elapsed time per iteration (s): 142.98 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.252451E-01 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.324 | TFLOPs: 146.23 | [default7]: iteration 941/ 3100 | consumed samples: 1927168 | consumed tokens: 3946840064 | elapsed time per iteration (s): 142.73 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.282248E-01 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.349 | TFLOPs: 146.48 | [default7]: iteration 942/ 3100 | consumed samples: 1929216 | consumed tokens: 3951034368 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.383102E-01 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.481 | TFLOPs: 147.83 | [default7]: iteration 943/ 3100 | consumed samples: 1931264 | consumed tokens: 3955228672 | elapsed time per iteration (s): 141.73 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.246662E-01 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.451 | TFLOPs: 147.52 | [default7]: iteration 944/ 3100 | consumed samples: 1933312 | consumed tokens: 3959422976 | elapsed time per iteration (s): 142.36 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.372051E-01 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.386 | TFLOPs: 146.86 | [default7]: iteration 945/ 3100 | consumed samples: 1935360 | consumed tokens: 3963617280 | elapsed time per iteration (s): 142.14 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.338870E-01 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.409 | TFLOPs: 147.09 | [default7]: iteration 946/ 3100 | consumed samples: 1937408 | consumed tokens: 3967811584 | elapsed time per iteration (s): 146.62 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.324695E-01 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 13.969 | TFLOPs: 142.60 | [default7]: iteration 947/ 3100 | consumed samples: 1939456 | consumed tokens: 3972005888 | elapsed time per iteration (s): 141.69 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.345860E-01 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.454 | TFLOPs: 147.55 | [default7]: iteration 948/ 3100 | consumed samples: 1941504 | consumed tokens: 3976200192 | elapsed time per iteration (s): 141.47 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.323384E-01 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.476 | TFLOPs: 147.78 | [default7]: iteration 949/ 3100 | consumed samples: 1943552 | consumed tokens: 3980394496 | elapsed time per iteration (s): 142.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.319108E-01 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.397 | TFLOPs: 146.97 | [default7]: iteration 950/ 3100 | consumed samples: 1945600 | consumed tokens: 3984588800 | elapsed time per iteration (s): 141.51 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.293511E-01 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.74 | [default7]: iteration 951/ 3100 | consumed samples: 1947648 | consumed tokens: 3988783104 | elapsed time per iteration (s): 141.62 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.341309E-01 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.462 | TFLOPs: 147.63 | [default7]: iteration 952/ 3100 | consumed samples: 1949696 | consumed tokens: 3992977408 | elapsed time per iteration (s): 141.30 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.273818E-01 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.494 | TFLOPs: 147.96 | [default7]: iteration 953/ 3100 | consumed samples: 1951744 | consumed tokens: 3997171712 | elapsed time per iteration (s): 142.47 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.213486E-01 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.375 | TFLOPs: 146.75 | [default7]: iteration 954/ 3100 | consumed samples: 1953792 | consumed tokens: 4001366016 | elapsed time per iteration (s): 142.99 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.366608E-01 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.322 | TFLOPs: 146.21 | [default7]: iteration 955/ 3100 | consumed samples: 1955840 | consumed tokens: 4005560320 | elapsed time per iteration (s): 142.74 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.287546E-01 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.348 | TFLOPs: 146.47 | [default7]: iteration 956/ 3100 | consumed samples: 1957888 | consumed tokens: 4009754624 | elapsed time per iteration (s): 141.11 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.304930E-01 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.514 | TFLOPs: 148.16 | [default7]: iteration 957/ 3100 | consumed samples: 1959936 | consumed tokens: 4013948928 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.300333E-01 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.500 | TFLOPs: 148.02 | [default7]: iteration 958/ 3100 | consumed samples: 1961984 | consumed tokens: 4018143232 | elapsed time per iteration (s): 141.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.244679E-01 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.488 | TFLOPs: 147.91 | [default7]: iteration 959/ 3100 | consumed samples: 1964032 | consumed tokens: 4022337536 | elapsed time per iteration (s): 141.80 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.341851E-01 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.443 | TFLOPs: 147.44 | [default7]: iteration 960/ 3100 | consumed samples: 1966080 | consumed tokens: 4026531840 | elapsed time per iteration (s): 141.60 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.228328E-01 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.464 | TFLOPs: 147.65 | [default7]: iteration 961/ 3100 | consumed samples: 1968128 | consumed tokens: 4030726144 | elapsed time per iteration (s): 142.31 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.333199E-01 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.391 | TFLOPs: 146.91 | [default7]: iteration 962/ 3100 | consumed samples: 1970176 | consumed tokens: 4034920448 | elapsed time per iteration (s): 141.56 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.351328E-01 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.467 | TFLOPs: 147.69 | [default7]: iteration 963/ 3100 | consumed samples: 1972224 | consumed tokens: 4039114752 | elapsed time per iteration (s): 141.21 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.253862E-01 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.503 | TFLOPs: 148.05 | [default7]: iteration 964/ 3100 | consumed samples: 1974272 | consumed tokens: 4043309056 | elapsed time per iteration (s): 141.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.216832E-01 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.485 | TFLOPs: 147.87 | [default7]: iteration 965/ 3100 | consumed samples: 1976320 | consumed tokens: 4047503360 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.237691E-01 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.98 | [default7]: iteration 966/ 3100 | consumed samples: 1978368 | consumed tokens: 4051697664 | elapsed time per iteration (s): 141.50 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.284940E-01 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.473 | TFLOPs: 147.75 | [default7]: iteration 967/ 3100 | consumed samples: 1980416 | consumed tokens: 4055891968 | elapsed time per iteration (s): 142.08 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.208547E-01 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.414 | TFLOPs: 147.14 | [default7]: iteration 968/ 3100 | consumed samples: 1982464 | consumed tokens: 4060086272 | elapsed time per iteration (s): 141.26 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.219368E-01 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.498 | TFLOPs: 148.00 | [default7]: iteration 969/ 3100 | consumed samples: 1984512 | consumed tokens: 4064280576 | elapsed time per iteration (s): 141.28 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.271530E-01 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.496 | TFLOPs: 147.98 | [default7]: iteration 970/ 3100 | consumed samples: 1986560 | consumed tokens: 4068474880 | elapsed time per iteration (s): 141.62 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.275118E-01 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.461 | TFLOPs: 147.62 | [default7]: iteration 971/ 3100 | consumed samples: 1988608 | consumed tokens: 4072669184 | elapsed time per iteration (s): 141.36 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.192925E-01 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.488 | TFLOPs: 147.90 | [default7]: iteration 972/ 3100 | consumed samples: 1990656 | consumed tokens: 4076863488 | elapsed time per iteration (s): 141.74 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.143256E-01 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.449 | TFLOPs: 147.50 | [default7]: iteration 973/ 3100 | consumed samples: 1992704 | consumed tokens: 4081057792 | elapsed time per iteration (s): 141.36 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.266827E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.487 | TFLOPs: 147.89 | [default7]: iteration 974/ 3100 | consumed samples: 1994752 | consumed tokens: 4085252096 | elapsed time per iteration (s): 141.37 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.241774E-01 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.486 | TFLOPs: 147.88 | [default7]: iteration 975/ 3100 | consumed samples: 1996800 | consumed tokens: 4089446400 | elapsed time per iteration (s): 142.34 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.237892E-01 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.388 | TFLOPs: 146.88 | [default7]: iteration 976/ 3100 | consumed samples: 1998848 | consumed tokens: 4093640704 | elapsed time per iteration (s): 141.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.232274E-01 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 977/ 3100 | consumed samples: 2000896 | consumed tokens: 4097835008 | elapsed time per iteration (s): 141.70 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.183110E-01 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.453 | TFLOPs: 147.54 | [default7]: iteration 978/ 3100 | consumed samples: 2002944 | consumed tokens: 4102029312 | elapsed time per iteration (s): 141.22 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.191039E-01 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.502 | TFLOPs: 148.05 | [default7]: iteration 979/ 3100 | consumed samples: 2004992 | consumed tokens: 4106223616 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.267354E-01 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.478 | TFLOPs: 147.80 | [default7]: iteration 980/ 3100 | consumed samples: 2007040 | consumed tokens: 4110417920 | elapsed time per iteration (s): 141.72 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.220588E-01 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.451 | TFLOPs: 147.52 | [default7]: iteration 981/ 3100 | consumed samples: 2009088 | consumed tokens: 4114612224 | elapsed time per iteration (s): 141.68 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.206864E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.455 | TFLOPs: 147.57 | [default7]: iteration 982/ 3100 | consumed samples: 2011136 | consumed tokens: 4118806528 | elapsed time per iteration (s): 141.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.240747E-01 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.492 | TFLOPs: 147.94 | [default7]: iteration 983/ 3100 | consumed samples: 2013184 | consumed tokens: 4123000832 | elapsed time per iteration (s): 141.53 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.238681E-01 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.471 | TFLOPs: 147.72 | [default7]: iteration 984/ 3100 | consumed samples: 2015232 | consumed tokens: 4127195136 | elapsed time per iteration (s): 141.13 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.234189E-01 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.512 | TFLOPs: 148.14 | [default7]: iteration 985/ 3100 | consumed samples: 2017280 | consumed tokens: 4131389440 | elapsed time per iteration (s): 141.44 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.154694E-01 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.82 | [default7]: iteration 986/ 3100 | consumed samples: 2019328 | consumed tokens: 4135583744 | elapsed time per iteration (s): 141.29 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.299780E-01 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.495 | TFLOPs: 147.97 | [default7]: iteration 987/ 3100 | consumed samples: 2021376 | consumed tokens: 4139778048 | elapsed time per iteration (s): 141.61 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.183088E-01 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.462 | TFLOPs: 147.64 | [default7]: iteration 988/ 3100 | consumed samples: 2023424 | consumed tokens: 4143972352 | elapsed time per iteration (s): 141.24 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.207254E-01 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.500 | TFLOPs: 148.02 | [default7]: iteration 989/ 3100 | consumed samples: 2025472 | consumed tokens: 4148166656 | elapsed time per iteration (s): 142.21 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.264053E-01 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.402 | TFLOPs: 147.02 | [default7]: iteration 990/ 3100 | consumed samples: 2027520 | consumed tokens: 4152360960 | elapsed time per iteration (s): 141.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.155163E-01 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.478 | TFLOPs: 147.80 | [default7]: iteration 991/ 3100 | consumed samples: 2029568 | consumed tokens: 4156555264 | elapsed time per iteration (s): 141.67 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.191193E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.456 | TFLOPs: 147.57 | [default7]: iteration 992/ 3100 | consumed samples: 2031616 | consumed tokens: 4160749568 | elapsed time per iteration (s): 141.16 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.296208E-01 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.509 | TFLOPs: 148.11 | [default7]: iteration 993/ 3100 | consumed samples: 2033664 | consumed tokens: 4164943872 | elapsed time per iteration (s): 141.26 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.226057E-01 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.498 | TFLOPs: 148.01 | [default7]: iteration 994/ 3100 | consumed samples: 2035712 | consumed tokens: 4169138176 | elapsed time per iteration (s): 141.67 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.132506E-01 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.457 | TFLOPs: 147.58 | [default7]: iteration 995/ 3100 | consumed samples: 2037760 | consumed tokens: 4173332480 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.192435E-01 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 996/ 3100 | consumed samples: 2039808 | consumed tokens: 4177526784 | elapsed time per iteration (s): 141.41 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.132035E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.483 | TFLOPs: 147.84 | [default0]:saving checkpoint at iteration 996 to /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]:[2022-09-05 11:14:36,602] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step996 is begin to save! [default4]:[2022-09-05 11:14:36,705] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_15-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_22-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_26-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_67-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_25-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,709] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_72-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_52-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_65-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_68-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_35-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_04-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_30-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_50-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_29-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_17-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_06-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_05-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_16-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,703] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_45-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_39-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_66-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_41-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_63-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_31-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_55-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_23-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_07-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_54-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_69-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,751] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_56-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_51-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,705] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_32-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_38-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_47-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,751] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_57-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_40-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_60-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_43-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_20-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_01-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_46-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_34-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_59-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_12-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_71-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_18-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_53-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_28-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_62-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_09-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_58-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,705] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_33-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_36-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_24-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,703] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_71_model_states.pt... [default4]:[2022-09-05 11:14:36,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_71_model_states.pt. [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_64-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,705] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_70-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_27-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_03-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,703] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_44-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_48-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_37-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_13-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_49-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_08-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,705] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_14-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_11-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_61-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_19-model_00-model_states.pt... [default4]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_21-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_10-model_00-model_states.pt... [default0]:[2022-09-05 11:14:36,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_42-model_00-model_states.pt... [default4]:[2022-09-05 11:14:39,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_29-model_00-model_states.pt. [default4]:[2022-09-05 11:14:39,794] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_27_model_states.pt... [default4]:[2022-09-05 11:14:39,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_27_model_states.pt. [default0]:[2022-09-05 11:14:39,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_28-model_00-model_states.pt. [default0]:[2022-09-05 11:14:39,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_26_model_states.pt... [default0]:[2022-09-05 11:14:39,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_26_model_states.pt. [default0]:[2022-09-05 11:14:40,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_70-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,033] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_68_model_states.pt... [default0]:[2022-09-05 11:14:40,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_68_model_states.pt. [default4]:[2022-09-05 11:14:40,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_19-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,019] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_17_model_states.pt... [default4]:[2022-09-05 11:14:40,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_17_model_states.pt. [default4]:[2022-09-05 11:14:40,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_15-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_13_model_states.pt... [default4]:[2022-09-05 11:14:40,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_13_model_states.pt. [default0]:[2022-09-05 11:14:40,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_72-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,096] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_74-model_00-model_states.pt... [default0]:[2022-09-05 11:14:40,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_74-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,103] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_70_model_states.pt... [default0]:[2022-09-05 11:14:40,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_70_model_states.pt. [default4]:[2022-09-05 11:14:40,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_41-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,045] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_39_model_states.pt... [default4]:[2022-09-05 11:14:40,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_39_model_states.pt. [default4]:[2022-09-05 11:14:40,085] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_55-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_53_model_states.pt... [default4]:[2022-09-05 11:14:40,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_53_model_states.pt. [default0]:[2022-09-05 11:14:40,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_32-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_30_model_states.pt... [default0]:[2022-09-05 11:14:40,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_30_model_states.pt. [default0]:[2022-09-05 11:14:40,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_40-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,097] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_38_model_states.pt... [default0]:[2022-09-05 11:14:40,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_38_model_states.pt. [default0]:[2022-09-05 11:14:40,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_20-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,133] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_18_model_states.pt... [default0]:[2022-09-05 11:14:40,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_18_model_states.pt. [default0]:[2022-09-05 11:14:40,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_14-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,162] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_12_model_states.pt... [default0]:[2022-09-05 11:14:40,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_12_model_states.pt. [default4]:[2022-09-05 11:14:40,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_21-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,160] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_19_model_states.pt... [default4]:[2022-09-05 11:14:40,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_19_model_states.pt. [default4]:[2022-09-05 11:14:40,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_67-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,217] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_65_model_states.pt... [default0]:[2022-09-05 11:14:40,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_18-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,267] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_16_model_states.pt... [default0]:[2022-09-05 11:14:40,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_16_model_states.pt. [default4]:[2022-09-05 11:14:40,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_33-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,250] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_31_model_states.pt... [default4]:[2022-09-05 11:14:40,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_31_model_states.pt. [default0]:[2022-09-05 11:14:40,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_36-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,261] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_34_model_states.pt... [default0]:[2022-09-05 11:14:40,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_34_model_states.pt. [default4]:[2022-09-05 11:14:40,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_27-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,250] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_25_model_states.pt... [default4]:[2022-09-05 11:14:40,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_25_model_states.pt. [default4]:[2022-09-05 11:14:40,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_03-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_01_model_states.pt... [default4]:[2022-09-05 11:14:40,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_01_model_states.pt. [default0]:[2022-09-05 11:14:40,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_48-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_46_model_states.pt... [default0]:[2022-09-05 11:14:40,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_46_model_states.pt. [default0]:[2022-09-05 11:14:40,293] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_44-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_42_model_states.pt... [default4]:[2022-09-05 11:14:40,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_65_model_states.pt. [default4]:[2022-09-05 11:14:40,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_25-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,261] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_23_model_states.pt... [default4]:[2022-09-05 11:14:40,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_23_model_states.pt. [default0]:[2022-09-05 11:14:40,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_04-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,264] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_02_model_states.pt... [default0]:[2022-09-05 11:14:40,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_02_model_states.pt. [default4]:[2022-09-05 11:14:40,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_05-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_03_model_states.pt... [default4]:[2022-09-05 11:14:40,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_03_model_states.pt. [default0]:[2022-09-05 11:14:40,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_16-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_14_model_states.pt... [default0]:[2022-09-05 11:14:40,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_14_model_states.pt. [default4]:[2022-09-05 11:14:40,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_31-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,257] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_29_model_states.pt... [default4]:[2022-09-05 11:14:40,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_29_model_states.pt. [default0]:[2022-09-05 11:14:40,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_54-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,261] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_52_model_states.pt... [default0]:[2022-09-05 11:14:40,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_52_model_states.pt. [default0]:[2022-09-05 11:14:40,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_34-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_32_model_states.pt... [default0]:[2022-09-05 11:14:40,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_12-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,283] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_10_model_states.pt... [default0]:[2022-09-05 11:14:40,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_10_model_states.pt. [default0]:[2022-09-05 11:14:40,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_64-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,378] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_62_model_states.pt... [default4]:[2022-09-05 11:14:40,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_37-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_35_model_states.pt... [default4]:[2022-09-05 11:14:40,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_35_model_states.pt. [default4]:[2022-09-05 11:14:40,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_13-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,349] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_11_model_states.pt... [default4]:[2022-09-05 11:14:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_11_model_states.pt. [default0]:[2022-09-05 11:14:40,347] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_08-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_06_model_states.pt... [default0]:[2022-09-05 11:14:40,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_06_model_states.pt. [default0]:[2022-09-05 11:14:40,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_42_model_states.pt. [default0]:[2022-09-05 11:14:40,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_26-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,346] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_24_model_states.pt... [default0]:[2022-09-05 11:14:40,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_24_model_states.pt. [default0]:[2022-09-05 11:14:40,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_68-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_66_model_states.pt... [default0]:[2022-09-05 11:14:40,372] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_66_model_states.pt. [default0]:[2022-09-05 11:14:40,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_50-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,356] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_48_model_states.pt... [default0]:[2022-09-05 11:14:40,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_48_model_states.pt. [default4]:[2022-09-05 11:14:40,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_17-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_15_model_states.pt... [default4]:[2022-09-05 11:14:40,372] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_15_model_states.pt. [default4]:[2022-09-05 11:14:40,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_23-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,400] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_21_model_states.pt... [default4]:[2022-09-05 11:14:40,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_21_model_states.pt. [default4]:[2022-09-05 11:14:40,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_69-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_67_model_states.pt... [default4]:[2022-09-05 11:14:40,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_67_model_states.pt. [default0]:[2022-09-05 11:14:40,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_32_model_states.pt. [default4]:[2022-09-05 11:14:40,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_71-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_69_model_states.pt... [default4]:[2022-09-05 11:14:40,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_69_model_states.pt. [default4]:[2022-09-05 11:14:40,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_09-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,403] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_07_model_states.pt... [default4]:[2022-09-05 11:14:40,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_07_model_states.pt. [default0]:[2022-09-05 11:14:40,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_62-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_60_model_states.pt... [default0]:[2022-09-05 11:14:40,459] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_60_model_states.pt. [default0]:[2022-09-05 11:14:40,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_24-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_22_model_states.pt... [default0]:[2022-09-05 11:14:40,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_22_model_states.pt. [default0]:[2022-09-05 11:14:40,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_62_model_states.pt. [default4]:[2022-09-05 11:14:40,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_49-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_47_model_states.pt... [default4]:[2022-09-05 11:14:40,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_47_model_states.pt. [default4]:[2022-09-05 11:14:40,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_11-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,471] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_09_model_states.pt... [default4]:[2022-09-05 11:14:40,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_09_model_states.pt. [default0]:[2022-09-05 11:14:40,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_10-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_08_model_states.pt... [default0]:[2022-09-05 11:14:40,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_08_model_states.pt. [default0]:[2022-09-05 11:14:40,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_22-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,427] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_20_model_states.pt... [default0]:[2022-09-05 11:14:40,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_20_model_states.pt. [default0]:[2022-09-05 11:14:40,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_52-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_50_model_states.pt... [default4]:[2022-09-05 11:14:40,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_35-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_33_model_states.pt... [default4]:[2022-09-05 11:14:40,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_33_model_states.pt. [default0]:[2022-09-05 11:14:40,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_30-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,500] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_28_model_states.pt... [default0]:[2022-09-05 11:14:40,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_28_model_states.pt. [default0]:[2022-09-05 11:14:40,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_06-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,471] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_04_model_states.pt... [default0]:[2022-09-05 11:14:40,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_04_model_states.pt. [default4]:[2022-09-05 11:14:40,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_45-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,454] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_43_model_states.pt... [default4]:[2022-09-05 11:14:40,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_43_model_states.pt. [default0]:[2022-09-05 11:14:40,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_66-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,443] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_64_model_states.pt... [default0]:[2022-09-05 11:14:40,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_64_model_states.pt. [default4]:[2022-09-05 11:14:40,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_63-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_61_model_states.pt... [default4]:[2022-09-05 11:14:40,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_61_model_states.pt. [default4]:[2022-09-05 11:14:40,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_07-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,454] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_05_model_states.pt... [default4]:[2022-09-05 11:14:40,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_05_model_states.pt. [default0]:[2022-09-05 11:14:40,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_38-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,550] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_36_model_states.pt... [default0]:[2022-09-05 11:14:40,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_36_model_states.pt. [default4]:[2022-09-05 11:14:40,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_51-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,502] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_49_model_states.pt... [default4]:[2022-09-05 11:14:40,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_49_model_states.pt. [default4]:[2022-09-05 11:14:40,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_47-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,486] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_45_model_states.pt... [default4]:[2022-09-05 11:14:40,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_45_model_states.pt. [default4]:[2022-09-05 11:14:40,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_43-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,533] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_41_model_states.pt... [default4]:[2022-09-05 11:14:40,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_41_model_states.pt. [default4]:[2022-09-05 11:14:40,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_59-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,513] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_57_model_states.pt... [default4]:[2022-09-05 11:14:40,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_57_model_states.pt. [default0]:[2022-09-05 11:14:40,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_42-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,545] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_40_model_states.pt... [default0]:[2022-09-05 11:14:40,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_40_model_states.pt. [default0]:[2022-09-05 11:14:40,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_50_model_states.pt. [default4]:[2022-09-05 11:14:40,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_65-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,531] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_63_model_states.pt... [default4]:[2022-09-05 11:14:40,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_63_model_states.pt. [default4]:[2022-09-05 11:14:40,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_39-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,553] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_37_model_states.pt... [default4]:[2022-09-05 11:14:40,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_37_model_states.pt. [default0]:[2022-09-05 11:14:40,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_60-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,627] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_58_model_states.pt... [default0]:[2022-09-05 11:14:40,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_58_model_states.pt. [default0]:[2022-09-05 11:14:40,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_46-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,583] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_44_model_states.pt... [default0]:[2022-09-05 11:14:40,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_44_model_states.pt. [default4]:[2022-09-05 11:14:40,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_53-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,604] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_51_model_states.pt... [default4]:[2022-09-05 11:14:40,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_51_model_states.pt. [default0]:[2022-09-05 11:14:40,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_58-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,629] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_56_model_states.pt... [default0]:[2022-09-05 11:14:40,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_56_model_states.pt. [default4]:[2022-09-05 11:14:40,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_61-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,610] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_59_model_states.pt... [default4]:[2022-09-05 11:14:40,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_59_model_states.pt. [default0]:[2022-09-05 11:14:40,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_56-model_00-model_states.pt. [default0]:[2022-09-05 11:14:40,901] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_54_model_states.pt... [default0]:[2022-09-05 11:14:40,903] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_54_model_states.pt. [default4]:[2022-09-05 11:14:40,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_57-model_00-model_states.pt. [default4]:[2022-09-05 11:14:40,870] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_55_model_states.pt... [default4]:[2022-09-05 11:14:40,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_55_model_states.pt. [default0]:[2022-09-05 11:14:41,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/layer_01-model_00-model_states.pt. [default0]:[2022-09-05 11:14:41,510] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_00_model_states.pt [default0]:[2022-09-05 11:14:41,510] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_00_model_states.pt... [default0]:[2022-09-05 11:14:41,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/mp_rank_00_model_states.pt. [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt... [default5]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt... [default4]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt... [default6]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt... [default7]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt... [default2]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt... [default1]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt... [default0]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt... [default3]:[2022-09-05 11:14:41,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt... [default2]:[2022-09-05 11:14:49,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt. [default2]:[2022-09-05 11:14:49,079] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt [default2]:[2022-09-05 11:14:49,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt. [default2]:[2022-09-05 11:14:49,223] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt [default3]:[2022-09-05 11:14:49,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt. [default3]:[2022-09-05 11:14:49,238] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt [default4]:[2022-09-05 11:14:49,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt. [default4]:[2022-09-05 11:14:49,498] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt [default1]:[2022-09-05 11:14:49,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt. [default1]:[2022-09-05 11:14:49,558] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt [default2]:[2022-09-05 11:14:49,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt. [default2]:[2022-09-05 11:14:49,527] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt [default2]:[2022-09-05 11:14:49,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt. [default2]:[2022-09-05 11:14:49,520] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt [default5]:[2022-09-05 11:14:49,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt. [default5]:[2022-09-05 11:14:49,593] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt [default0]:[2022-09-05 11:14:49,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt. [default0]:[2022-09-05 11:14:49,527] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt [default3]:[2022-09-05 11:14:49,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt. [default3]:[2022-09-05 11:14:49,682] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt [default0]:[2022-09-05 11:14:49,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt. [default0]:[2022-09-05 11:14:49,722] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt [default6]:[2022-09-05 11:14:49,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt. [default6]:[2022-09-05 11:14:49,734] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt [default4]:[2022-09-05 11:14:49,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt. [default4]:[2022-09-05 11:14:49,722] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt [default0]:[2022-09-05 11:14:49,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt. [default0]:[2022-09-05 11:14:49,792] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt [default5]:[2022-09-05 11:14:49,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt. [default5]:[2022-09-05 11:14:49,909] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt [default7]:[2022-09-05 11:14:49,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt. [default7]:[2022-09-05 11:14:49,916] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt [default0]:[2022-09-05 11:14:49,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt. [default0]:[2022-09-05 11:14:49,924] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt [default7]:[2022-09-05 11:14:49,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt. [default7]:[2022-09-05 11:14:49,967] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt [default0]:[2022-09-05 11:14:50,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt. [default0]:[2022-09-05 11:14:50,016] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt [default2]:[2022-09-05 11:14:49,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt. [default2]:[2022-09-05 11:14:49,997] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt [default3]:[2022-09-05 11:14:50,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt. [default3]:[2022-09-05 11:14:50,058] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt [default0]:[2022-09-05 11:14:50,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt. [default0]:[2022-09-05 11:14:50,052] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt [default7]:[2022-09-05 11:14:50,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt. [default7]:[2022-09-05 11:14:50,042] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt [default5]:[2022-09-05 11:14:50,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt. [default5]:[2022-09-05 11:14:50,095] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt [default4]:[2022-09-05 11:14:50,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt. [default4]:[2022-09-05 11:14:50,167] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt [default1]:[2022-09-05 11:14:50,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt. [default1]:[2022-09-05 11:14:50,137] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt [default3]:[2022-09-05 11:14:50,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt. [default3]:[2022-09-05 11:14:50,252] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt [default7]:[2022-09-05 11:14:50,233] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt. [default7]:[2022-09-05 11:14:50,233] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt [default0]:[2022-09-05 11:14:50,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt. [default0]:[2022-09-05 11:14:50,210] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt [default6]:[2022-09-05 11:14:50,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt. [default6]:[2022-09-05 11:14:50,304] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt [default5]:[2022-09-05 11:14:50,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt. [default5]:[2022-09-05 11:14:50,223] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt [default2]:[2022-09-05 11:14:50,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt. [default2]:[2022-09-05 11:14:50,312] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt [default3]:[2022-09-05 11:14:50,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt. [default3]:[2022-09-05 11:14:50,329] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt [default6]:[2022-09-05 11:14:50,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt. [default6]:[2022-09-05 11:14:50,360] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt [default1]:[2022-09-05 11:14:50,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt. [default1]:[2022-09-05 11:14:50,409] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt [default5]:[2022-09-05 11:14:50,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt. [default5]:[2022-09-05 11:14:50,423] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt [default1]:[2022-09-05 11:14:50,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt. [default1]:[2022-09-05 11:14:50,466] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt [default3]:[2022-09-05 11:14:50,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt. [default3]:[2022-09-05 11:14:50,505] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt [default1]:[2022-09-05 11:14:50,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt. [default1]:[2022-09-05 11:14:50,454] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt [default5]:[2022-09-05 11:14:50,425] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt. [default5]:[2022-09-05 11:14:50,426] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt [default1]:[2022-09-05 11:14:50,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt. [default1]:[2022-09-05 11:14:50,511] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt [default5]:[2022-09-05 11:14:50,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt. [default5]:[2022-09-05 11:14:50,522] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt [default0]:[2022-09-05 11:14:50,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt. [default0]:[2022-09-05 11:14:50,582] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt [default4]:[2022-09-05 11:14:50,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt. [default4]:[2022-09-05 11:14:50,547] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt [default7]:[2022-09-05 11:14:50,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt. [default7]:[2022-09-05 11:14:50,527] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt [default1]:[2022-09-05 11:14:50,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt. [default1]:[2022-09-05 11:14:50,537] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt [default7]:[2022-09-05 11:14:50,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt. [default7]:[2022-09-05 11:14:50,621] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt [default6]:[2022-09-05 11:14:50,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt. [default6]:[2022-09-05 11:14:50,679] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt [default7]:[2022-09-05 11:14:50,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt. [default7]:[2022-09-05 11:14:50,686] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt [default0]:[2022-09-05 11:14:50,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt. [default0]:[2022-09-05 11:14:50,688] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt [default4]:[2022-09-05 11:14:50,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt. [default4]:[2022-09-05 11:14:50,649] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt [default0]:[2022-09-05 11:14:50,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt. [default0]:[2022-09-05 11:14:50,634] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt [default3]:[2022-09-05 11:14:50,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt. [default3]:[2022-09-05 11:14:50,716] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt [default4]:[2022-09-05 11:14:50,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt. [default4]:[2022-09-05 11:14:50,663] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt [default6]:[2022-09-05 11:14:50,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt. [default6]:[2022-09-05 11:14:50,754] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt [default3]:[2022-09-05 11:14:50,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt. [default3]:[2022-09-05 11:14:50,765] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt [default6]:[2022-09-05 11:14:50,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt. [default6]:[2022-09-05 11:14:50,715] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt [default4]:[2022-09-05 11:14:50,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt. [default4]:[2022-09-05 11:14:50,725] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt [default3]:[2022-09-05 11:14:50,713] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt. [default3]:[2022-09-05 11:14:50,713] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt [default0]:[2022-09-05 11:14:50,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt. [default0]:[2022-09-05 11:14:50,751] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt [default2]:[2022-09-05 11:14:50,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt. [default2]:[2022-09-05 11:14:50,708] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt [default2]:[2022-09-05 11:14:50,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt. [default2]:[2022-09-05 11:14:50,755] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt [default1]:[2022-09-05 11:14:50,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt. [default1]:[2022-09-05 11:14:50,780] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt [default7]:[2022-09-05 11:14:50,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt. [default7]:[2022-09-05 11:14:50,823] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt [default1]:[2022-09-05 11:14:50,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt. [default1]:[2022-09-05 11:14:50,799] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt [default7]:[2022-09-05 11:14:50,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt. [default7]:[2022-09-05 11:14:50,814] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt [default2]:[2022-09-05 11:14:50,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt. [default2]:[2022-09-05 11:14:50,839] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt [default1]:[2022-09-05 11:14:50,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt. [default1]:[2022-09-05 11:14:50,822] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt [default4]:[2022-09-05 11:14:50,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt. [default4]:[2022-09-05 11:14:50,931] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt [default7]:[2022-09-05 11:14:50,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt. [default7]:[2022-09-05 11:14:50,891] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt [default4]:[2022-09-05 11:14:50,921] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt. [default4]:[2022-09-05 11:14:50,921] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt [default6]:[2022-09-05 11:14:50,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt. [default6]:[2022-09-05 11:14:50,969] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt [default6]:[2022-09-05 11:14:50,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt. [default6]:[2022-09-05 11:14:50,940] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt [default2]:[2022-09-05 11:14:50,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt. [default2]:[2022-09-05 11:14:50,938] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt [default6]:[2022-09-05 11:14:50,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt. [default6]:[2022-09-05 11:14:50,932] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt [default4]:[2022-09-05 11:14:51,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt. [default4]:[2022-09-05 11:14:51,002] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt [default0]:[2022-09-05 11:14:51,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt. [default0]:[2022-09-05 11:14:51,033] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt [default3]:[2022-09-05 11:14:51,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt. [default3]:[2022-09-05 11:14:51,021] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt [default0]:[2022-09-05 11:14:51,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt. [default0]:[2022-09-05 11:14:51,045] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt [default6]:[2022-09-05 11:14:51,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt. [default6]:[2022-09-05 11:14:51,019] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt [default2]:[2022-09-05 11:14:51,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt. [default2]:[2022-09-05 11:14:51,037] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt [default6]:[2022-09-05 11:14:51,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt. [default6]:[2022-09-05 11:14:51,062] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt [default6]:[2022-09-05 11:14:51,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt. [default6]:[2022-09-05 11:14:51,030] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt [default7]:[2022-09-05 11:14:51,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt. [default7]:[2022-09-05 11:14:51,071] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt [default5]:[2022-09-05 11:14:51,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt. [default5]:[2022-09-05 11:14:51,096] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt [default7]:[2022-09-05 11:14:51,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt. [default7]:[2022-09-05 11:14:51,127] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt [default5]:[2022-09-05 11:14:51,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt. [default5]:[2022-09-05 11:14:51,147] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt [default6]:[2022-09-05 11:14:51,081] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt. [default6]:[2022-09-05 11:14:51,081] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt [default5]:[2022-09-05 11:14:51,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt. [default5]:[2022-09-05 11:14:51,105] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt [default5]:[2022-09-05 11:14:51,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt. [default5]:[2022-09-05 11:14:51,118] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt [default6]:[2022-09-05 11:14:51,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt. [default6]:[2022-09-05 11:14:51,129] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt [default7]:[2022-09-05 11:14:51,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt. [default7]:[2022-09-05 11:14:51,160] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt [default0]:[2022-09-05 11:14:51,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt. [default0]:[2022-09-05 11:14:51,241] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt [default6]:[2022-09-05 11:14:51,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt. [default6]:[2022-09-05 11:14:51,165] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt [default5]:[2022-09-05 11:14:51,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt. [default5]:[2022-09-05 11:14:51,187] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt [default4]:[2022-09-05 11:14:51,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt. [default4]:[2022-09-05 11:14:51,177] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt [default1]:[2022-09-05 11:14:51,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt. [default1]:[2022-09-05 11:14:51,294] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt [default7]:[2022-09-05 11:14:51,271] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt. [default7]:[2022-09-05 11:14:51,271] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt [default5]:[2022-09-05 11:14:51,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt. [default5]:[2022-09-05 11:14:51,269] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt [default3]:[2022-09-05 11:14:51,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt. [default3]:[2022-09-05 11:14:51,263] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt [default2]:[2022-09-05 11:14:51,300] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt. [default2]:[2022-09-05 11:14:51,301] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt [default5]:[2022-09-05 11:14:51,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt. [default5]:[2022-09-05 11:14:51,350] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt [default0]:[2022-09-05 11:14:51,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt. [default0]:[2022-09-05 11:14:51,327] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt [default2]:[2022-09-05 11:14:51,347] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt. [default2]:[2022-09-05 11:14:51,347] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt [default1]:[2022-09-05 11:14:51,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt. [default1]:[2022-09-05 11:14:51,435] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt [default5]:[2022-09-05 11:14:51,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt. [default5]:[2022-09-05 11:14:51,446] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt [default6]:[2022-09-05 11:14:51,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt. [default6]:[2022-09-05 11:14:51,383] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt [default1]:[2022-09-05 11:14:51,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt. [default1]:[2022-09-05 11:14:51,375] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt [default1]:[2022-09-05 11:14:51,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt. [default1]:[2022-09-05 11:14:51,434] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt [default6]:[2022-09-05 11:14:51,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt. [default6]:[2022-09-05 11:14:51,408] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt [default5]:[2022-09-05 11:14:51,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt. [default5]:[2022-09-05 11:14:51,490] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt [default5]:[2022-09-05 11:14:51,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt. [default5]:[2022-09-05 11:14:51,420] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt [default1]:[2022-09-05 11:14:51,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt. [default1]:[2022-09-05 11:14:51,448] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt [default3]:[2022-09-05 11:14:51,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt. [default3]:[2022-09-05 11:14:51,422] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt [default1]:[2022-09-05 11:14:51,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt. [default1]:[2022-09-05 11:14:51,475] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt [default5]:[2022-09-05 11:14:51,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt. [default5]:[2022-09-05 11:14:51,494] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt [default4]:[2022-09-05 11:14:51,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt. [default4]:[2022-09-05 11:14:51,453] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt [default3]:[2022-09-05 11:14:51,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt. [default3]:[2022-09-05 11:14:51,565] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt [default4]:[2022-09-05 11:14:51,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt. [default4]:[2022-09-05 11:14:51,540] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt [default0]:[2022-09-05 11:14:51,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt. [default0]:[2022-09-05 11:14:51,549] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt [default4]:[2022-09-05 11:14:51,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt. [default4]:[2022-09-05 11:14:51,566] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt [default5]:[2022-09-05 11:14:51,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt. [default5]:[2022-09-05 11:14:51,551] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt [default7]:[2022-09-05 11:14:51,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt. [default7]:[2022-09-05 11:14:51,573] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt [default7]:[2022-09-05 11:14:51,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt. [default7]:[2022-09-05 11:14:51,562] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt [default3]:[2022-09-05 11:14:51,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt. [default3]:[2022-09-05 11:14:51,604] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt [default2]:[2022-09-05 11:14:51,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt. [default2]:[2022-09-05 11:14:51,691] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt [default1]:[2022-09-05 11:14:51,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt. [default1]:[2022-09-05 11:14:51,659] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt [default2]:[2022-09-05 11:14:51,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt. [default2]:[2022-09-05 11:14:51,721] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt [default3]:[2022-09-05 11:14:51,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt. [default3]:[2022-09-05 11:14:51,696] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt [default3]:[2022-09-05 11:14:51,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt. [default3]:[2022-09-05 11:14:51,731] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt [default7]:[2022-09-05 11:14:51,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt. [default7]:[2022-09-05 11:14:51,704] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt [default3]:[2022-09-05 11:14:51,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt. [default3]:[2022-09-05 11:14:51,738] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt [default1]:[2022-09-05 11:14:51,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt. [default1]:[2022-09-05 11:14:51,655] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt [default1]:[2022-09-05 11:14:51,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt. [default1]:[2022-09-05 11:14:51,681] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt [default3]:[2022-09-05 11:14:51,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt. [default3]:[2022-09-05 11:14:51,714] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt [default3]:[2022-09-05 11:14:51,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt. [default3]:[2022-09-05 11:14:51,718] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt [default2]:[2022-09-05 11:14:51,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt. [default2]:[2022-09-05 11:14:51,714] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt [default7]:[2022-09-05 11:14:51,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt. [default7]:[2022-09-05 11:14:51,729] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt [default5]:[2022-09-05 11:14:51,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt. [default5]:[2022-09-05 11:14:51,801] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt [default0]:[2022-09-05 11:14:51,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt. [default0]:[2022-09-05 11:14:51,826] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt [default6]:[2022-09-05 11:14:51,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt. [default6]:[2022-09-05 11:14:51,868] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt [default7]:[2022-09-05 11:14:51,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt. [default7]:[2022-09-05 11:14:51,877] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt [default2]:[2022-09-05 11:14:51,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt. [default2]:[2022-09-05 11:14:51,888] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt [default5]:[2022-09-05 11:14:51,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt. [default5]:[2022-09-05 11:14:51,940] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt [default0]:[2022-09-05 11:14:51,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt. [default0]:[2022-09-05 11:14:51,984] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt [default0]:[2022-09-05 11:14:52,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt. [default0]:[2022-09-05 11:14:52,012] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt [default5]:[2022-09-05 11:14:52,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt. [default5]:[2022-09-05 11:14:52,050] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt [default6]:[2022-09-05 11:14:52,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt. [default6]:[2022-09-05 11:14:52,034] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt [default7]:[2022-09-05 11:14:52,081] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt. [default7]:[2022-09-05 11:14:52,081] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt [default2]:[2022-09-05 11:14:52,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt. [default2]:[2022-09-05 11:14:52,043] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt [default1]:[2022-09-05 11:14:52,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt. [default1]:[2022-09-05 11:14:52,101] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt [default5]:[2022-09-05 11:14:52,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt. [default5]:[2022-09-05 11:14:52,087] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt [default6]:[2022-09-05 11:14:52,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt. [default6]:[2022-09-05 11:14:52,136] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt [default3]:[2022-09-05 11:14:52,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt. [default3]:[2022-09-05 11:14:52,129] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt [default0]:[2022-09-05 11:14:52,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt. [default0]:[2022-09-05 11:14:52,175] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt [default2]:[2022-09-05 11:14:52,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt. [default2]:[2022-09-05 11:14:52,165] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt [default1]:[2022-09-05 11:14:52,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt. [default1]:[2022-09-05 11:14:52,208] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt [default4]:[2022-09-05 11:14:52,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt. [default4]:[2022-09-05 11:14:52,261] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt [default0]:[2022-09-05 11:14:52,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt. [default0]:[2022-09-05 11:14:52,251] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt [default0]:[2022-09-05 11:14:52,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt. [default0]:[2022-09-05 11:14:52,287] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt [default4]:[2022-09-05 11:14:52,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt. [default4]:[2022-09-05 11:14:52,393] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt [default7]:[2022-09-05 11:14:52,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt. [default7]:[2022-09-05 11:14:52,321] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt [default4]:[2022-09-05 11:14:52,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt. [default4]:[2022-09-05 11:14:52,352] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt [default7]:[2022-09-05 11:14:52,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt. [default7]:[2022-09-05 11:14:52,437] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt [default3]:[2022-09-05 11:14:52,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt. [default3]:[2022-09-05 11:14:52,577] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt [default2]:[2022-09-05 11:14:52,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt. [default2]:[2022-09-05 11:14:52,586] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt [default1]:[2022-09-05 11:14:52,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt. [default1]:[2022-09-05 11:14:52,639] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt [default1]:[2022-09-05 11:14:52,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt. [default1]:[2022-09-05 11:14:52,647] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt [default4]:[2022-09-05 11:14:52,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt. [default4]:[2022-09-05 11:14:52,694] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt [default2]:[2022-09-05 11:14:52,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt. [default2]:[2022-09-05 11:14:52,662] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt [default4]:[2022-09-05 11:14:52,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt. [default4]:[2022-09-05 11:14:52,721] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt [default6]:[2022-09-05 11:14:52,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt. [default6]:[2022-09-05 11:14:52,783] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt [default4]:[2022-09-05 11:14:52,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt. [default4]:[2022-09-05 11:14:52,750] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt [default0]:[2022-09-05 11:14:52,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt. [default0]:[2022-09-05 11:14:52,738] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt [default4]:[2022-09-05 11:14:52,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt. [default4]:[2022-09-05 11:14:52,890] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt [default2]:[2022-09-05 11:14:52,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt. [default2]:[2022-09-05 11:14:52,857] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt [default3]:[2022-09-05 11:14:52,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt. [default3]:[2022-09-05 11:14:52,844] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt [default1]:[2022-09-05 11:14:52,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt. [default1]:[2022-09-05 11:14:52,893] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt [default3]:[2022-09-05 11:14:52,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt. [default3]:[2022-09-05 11:14:52,883] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt [default6]:[2022-09-05 11:14:52,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt. [default6]:[2022-09-05 11:14:52,999] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt [default2]:[2022-09-05 11:14:52,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt. [default2]:[2022-09-05 11:14:52,930] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt [default7]:[2022-09-05 11:14:52,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt. [default7]:[2022-09-05 11:14:52,976] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt [default3]:[2022-09-05 11:14:52,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt. [default3]:[2022-09-05 11:14:52,943] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt [default0]:[2022-09-05 11:14:53,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt. [default0]:[2022-09-05 11:14:53,016] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt [default3]:[2022-09-05 11:14:53,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt. [default3]:[2022-09-05 11:14:53,139] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt [default3]:[2022-09-05 11:14:53,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt. [default3]:[2022-09-05 11:14:53,316] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt [default0]:[2022-09-05 11:14:53,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt. [default0]:[2022-09-05 11:14:53,359] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt [default7]:[2022-09-05 11:14:53,425] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt. [default7]:[2022-09-05 11:14:53,425] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt [default4]:[2022-09-05 11:14:53,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt. [default4]:[2022-09-05 11:14:53,444] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt [default2]:[2022-09-05 11:14:53,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt. [default2]:[2022-09-05 11:14:53,452] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt [default5]:[2022-09-05 11:14:53,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt. [default5]:[2022-09-05 11:14:53,431] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt [default3]:[2022-09-05 11:14:53,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt. [default3]:[2022-09-05 11:14:53,539] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt [default2]:[2022-09-05 11:14:53,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt. [default2]:[2022-09-05 11:14:53,566] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt [default7]:[2022-09-05 11:14:53,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt. [default7]:[2022-09-05 11:14:53,588] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt [default4]:[2022-09-05 11:14:53,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt. [default4]:[2022-09-05 11:14:53,630] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt [default7]:[2022-09-05 11:14:53,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt. [default7]:[2022-09-05 11:14:53,620] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt [default3]:[2022-09-05 11:14:53,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt. [default3]:[2022-09-05 11:14:53,737] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt [default5]:[2022-09-05 11:14:53,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt. [default5]:[2022-09-05 11:14:53,698] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt [default2]:[2022-09-05 11:14:53,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt. [default2]:[2022-09-05 11:14:53,758] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt [default1]:[2022-09-05 11:14:53,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt. [default1]:[2022-09-05 11:14:53,810] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt [default0]:[2022-09-05 11:14:53,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt. [default0]:[2022-09-05 11:14:53,799] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt [default5]:[2022-09-05 11:14:53,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt. [default5]:[2022-09-05 11:14:53,865] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt [default1]:[2022-09-05 11:14:53,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt. [default1]:[2022-09-05 11:14:53,875] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt [default1]:[2022-09-05 11:14:53,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt. [default1]:[2022-09-05 11:14:53,968] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt [default1]:[2022-09-05 11:14:53,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt. [default1]:[2022-09-05 11:14:53,962] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt [default1]:[2022-09-05 11:14:53,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt. [default1]:[2022-09-05 11:14:53,935] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt [default4]:[2022-09-05 11:14:54,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt. [default4]:[2022-09-05 11:14:54,103] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt [default2]:[2022-09-05 11:14:54,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt. [default2]:[2022-09-05 11:14:54,154] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt [default0]:[2022-09-05 11:14:54,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt. [default0]:[2022-09-05 11:14:54,172] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt [default7]:[2022-09-05 11:14:54,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt. [default7]:[2022-09-05 11:14:54,242] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt [default6]:[2022-09-05 11:14:54,271] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt. [default6]:[2022-09-05 11:14:54,271] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt [default0]:[2022-09-05 11:14:54,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt. [default0]:[2022-09-05 11:14:54,296] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt [default4]:[2022-09-05 11:14:54,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt. [default4]:[2022-09-05 11:14:54,347] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt [default6]:[2022-09-05 11:14:54,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt. [default6]:[2022-09-05 11:14:54,444] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt [default2]:[2022-09-05 11:14:54,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt. [default2]:[2022-09-05 11:14:54,493] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt [default5]:[2022-09-05 11:14:54,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt. [default5]:[2022-09-05 11:14:54,791] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt [default5]:[2022-09-05 11:14:54,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt. [default5]:[2022-09-05 11:14:54,814] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt [default0]:[2022-09-05 11:14:54,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt. [default0]:[2022-09-05 11:14:54,843] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt [default3]:[2022-09-05 11:14:54,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt. [default3]:[2022-09-05 11:14:54,857] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt [default6]:[2022-09-05 11:14:54,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt. [default6]:[2022-09-05 11:14:54,956] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt [default7]:[2022-09-05 11:14:54,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt. [default7]:[2022-09-05 11:14:54,979] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt [default4]:[2022-09-05 11:14:54,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt. [default4]:[2022-09-05 11:14:54,957] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt [default6]:[2022-09-05 11:14:54,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt. [default6]:[2022-09-05 11:14:54,972] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt [default7]:[2022-09-05 11:14:55,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt. [default7]:[2022-09-05 11:14:55,063] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt [default3]:[2022-09-05 11:14:55,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt. [default3]:[2022-09-05 11:14:55,230] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt [default4]:[2022-09-05 11:14:55,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt. [default4]:[2022-09-05 11:14:55,288] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt [default2]:[2022-09-05 11:14:55,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt. [default2]:[2022-09-05 11:14:55,319] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt [default3]:[2022-09-05 11:14:55,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt. [default3]:[2022-09-05 11:14:55,318] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt [default6]:[2022-09-05 11:14:55,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt. [default6]:[2022-09-05 11:14:55,359] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt [default6]:[2022-09-05 11:14:55,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt. [default6]:[2022-09-05 11:14:55,434] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt [default1]:[2022-09-05 11:14:55,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt. [default1]:[2022-09-05 11:14:55,531] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt [default2]:[2022-09-05 11:14:55,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt. [default2]:[2022-09-05 11:14:55,793] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt [default4]:[2022-09-05 11:14:55,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt. [default4]:[2022-09-05 11:14:55,845] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt [default4]:[2022-09-05 11:14:55,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt. [default4]:[2022-09-05 11:14:55,943] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt [default6]:[2022-09-05 11:14:55,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt. [default6]:[2022-09-05 11:14:55,942] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt [default0]:[2022-09-05 11:14:56,073] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt. [default0]:[2022-09-05 11:14:56,074] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt [default5]:[2022-09-05 11:14:56,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt. [default5]:[2022-09-05 11:14:56,140] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt [default7]:[2022-09-05 11:14:56,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt. [default7]:[2022-09-05 11:14:56,176] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt [default4]:[2022-09-05 11:14:56,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt. [default4]:[2022-09-05 11:14:56,170] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt [default6]:[2022-09-05 11:14:56,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt. [default6]:[2022-09-05 11:14:56,254] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt [default7]:[2022-09-05 11:14:56,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt. [default7]:[2022-09-05 11:14:56,396] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt [default4]:[2022-09-05 11:14:56,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt. [default4]:[2022-09-05 11:14:56,410] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt [default5]:[2022-09-05 11:14:56,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt. [default5]:[2022-09-05 11:14:56,583] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt [default6]:[2022-09-05 11:14:56,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt. [default6]:[2022-09-05 11:14:56,641] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt [default6]:[2022-09-05 11:14:56,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt. [default6]:[2022-09-05 11:14:56,739] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt [default7]:[2022-09-05 11:14:57,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt. [default7]:[2022-09-05 11:14:57,454] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt [default5]:[2022-09-05 11:14:57,459] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt. [default5]:[2022-09-05 11:14:57,459] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt [default6]:[2022-09-05 11:14:57,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt. [default6]:[2022-09-05 11:14:57,580] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt [default0]:[2022-09-05 11:14:58,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt. [default0]:[2022-09-05 11:14:58,113] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt [default2]:[2022-09-05 11:14:58,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt. [default2]:[2022-09-05 11:14:58,346] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt [default1]:[2022-09-05 11:14:58,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt. [default1]:[2022-09-05 11:14:58,398] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt [default7]:[2022-09-05 11:14:58,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt. [default7]:[2022-09-05 11:14:58,445] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt [default0]:[2022-09-05 11:14:58,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt. [default0]:[2022-09-05 11:14:58,453] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt [default3]:[2022-09-05 11:14:58,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt. [default3]:[2022-09-05 11:14:58,478] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt [default2]:[2022-09-05 11:14:58,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt. [default2]:[2022-09-05 11:14:58,443] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt [default1]:[2022-09-05 11:14:58,458] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt. [default1]:[2022-09-05 11:14:58,458] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt [default3]:[2022-09-05 11:14:58,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt. [default3]:[2022-09-05 11:14:58,476] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt [default2]:[2022-09-05 11:14:58,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt. [default2]:[2022-09-05 11:14:58,542] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt [default4]:[2022-09-05 11:14:58,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt. [default4]:[2022-09-05 11:14:58,577] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt [default5]:[2022-09-05 11:14:58,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt. [default5]:[2022-09-05 11:14:58,622] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt [default5]:[2022-09-05 11:14:58,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt. [default5]:[2022-09-05 11:14:58,920] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt [default0]:[2022-09-05 11:14:59,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt. [default0]:[2022-09-05 11:14:59,259] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt [default4]:[2022-09-05 11:14:59,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt. [default4]:[2022-09-05 11:14:59,258] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt [default1]:[2022-09-05 11:14:59,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt. [default1]:[2022-09-05 11:14:59,447] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt [default2]:[2022-09-05 11:15:00,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt. [default2]:[2022-09-05 11:15:00,263] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt [default3]:[2022-09-05 11:15:00,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt. [default3]:[2022-09-05 11:15:00,591] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt [default5]:[2022-09-05 11:15:00,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt. [default5]:[2022-09-05 11:15:00,774] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt [default4]:[2022-09-05 11:15:00,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt. [default4]:[2022-09-05 11:15:00,763] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt [default6]:[2022-09-05 11:15:00,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt. [default6]:[2022-09-05 11:15:00,804] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt [default0]:[2022-09-05 11:15:01,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt. [default0]:[2022-09-05 11:15:01,437] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt [default1]:[2022-09-05 11:15:01,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt. [default1]:[2022-09-05 11:15:01,525] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt [default7]:[2022-09-05 11:15:02,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt. [default7]:[2022-09-05 11:15:02,147] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt [default0]:[2022-09-05 11:15:05,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt. [default0]:[2022-09-05 11:15:05,101] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt [default2]:[2022-09-05 11:15:05,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt. [default2]:[2022-09-05 11:15:05,295] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt [default3]:[2022-09-05 11:15:07,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt. [default3]:[2022-09-05 11:15:07,991] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt [default1]:[2022-09-05 11:15:07,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt. [default1]:[2022-09-05 11:15:07,999] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt [default4]:[2022-09-05 11:15:08,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt. [default4]:[2022-09-05 11:15:08,364] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt [default5]:[2022-09-05 11:15:08,455] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt. [default5]:[2022-09-05 11:15:08,455] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt [default2]:[2022-09-05 11:15:08,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. [default2]:[2022-09-05 11:15:08,687] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt [default3]:[2022-09-05 11:15:09,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. [default3]:[2022-09-05 11:15:09,744] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt [default7]:[2022-09-05 11:15:10,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt. [default7]:[2022-09-05 11:15:10,208] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt [default6]:[2022-09-05 11:15:10,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt. [default6]:[2022-09-05 11:15:10,409] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt [default5]:[2022-09-05 11:15:12,757] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt. [default5]:[2022-09-05 11:15:12,757] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt [default4]:[2022-09-05 11:15:15,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt. [default4]:[2022-09-05 11:15:15,211] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt [default1]:[2022-09-05 11:15:18,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. [default1]:[2022-09-05 11:15:18,434] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt [default6]:[2022-09-05 11:15:18,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt. [default6]:[2022-09-05 11:15:18,571] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt [default7]:[2022-09-05 11:15:18,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt. [default7]:[2022-09-05 11:15:18,662] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:time (ms) | save-checkpoint: 42165.98 [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [default0]:[2022-09-05 11:15:18,745] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step996/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]: successfully saved checkpoint at iteration 996 to /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default3]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default6]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default5]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default2]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default0]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default4]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default1]:[2022-09-05 11:15:18,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step996 is ready now! [default7]: iteration 997/ 3100 | consumed samples: 2041856 | consumed tokens: 4181721088 | elapsed time per iteration (s): 183.68 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.175508E-01 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 11.150 | TFLOPs: 113.82 | [default7]: iteration 998/ 3100 | consumed samples: 2043904 | consumed tokens: 4185915392 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.247245E-01 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.02 | [default7]: iteration 999/ 3100 | consumed samples: 2045952 | consumed tokens: 4190109696 | elapsed time per iteration (s): 141.59 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.115957E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.464 | TFLOPs: 147.66 | [default7]: iteration 1000/ 3100 | consumed samples: 2048000 | consumed tokens: 4194304000 | elapsed time per iteration (s): 141.17 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.144072E-01 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.508 | TFLOPs: 148.10 | [default7]:----------------------------------------------------------------------------------------------------------- [default7]:validation_pretraining loss at iteration 1000 | lm loss value: 2.629892E+00 | lm loss PPL: 1.387228E+01 | [default7]:----------------------------------------------------------------------------------------------------------- [default7]:------------------------------------------------------------------------------------------ [default7]:valid loss at iteration 1000 | lm loss value: 1.340349E+00 | lm loss PPL: 3.820377E+00 | [default7]:------------------------------------------------------------------------------------------ [default7]: iteration 1001/ 3100 | consumed samples: 2050048 | consumed tokens: 4198498304 | elapsed time per iteration (s): 228.59 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.166801E-01 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 8.959 | TFLOPs: 91.46 | [default7]: iteration 1002/ 3100 | consumed samples: 2052096 | consumed tokens: 4202692608 | elapsed time per iteration (s): 141.18 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.161191E-01 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.506 | TFLOPs: 148.08 | [default7]: iteration 1003/ 3100 | consumed samples: 2054144 | consumed tokens: 4206886912 | elapsed time per iteration (s): 141.15 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.120843E-01 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.509 | TFLOPs: 148.12 | [default7]: iteration 1004/ 3100 | consumed samples: 2056192 | consumed tokens: 4211081216 | elapsed time per iteration (s): 141.27 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.142606E-01 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.497 | TFLOPs: 147.99 | [default7]: iteration 1005/ 3100 | consumed samples: 2058240 | consumed tokens: 4215275520 | elapsed time per iteration (s): 141.36 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.165030E-01 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.488 | TFLOPs: 147.90 | [default7]: iteration 1006/ 3100 | consumed samples: 2060288 | consumed tokens: 4219469824 | elapsed time per iteration (s): 141.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.158648E-01 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.484 | TFLOPs: 147.86 | [default7]: iteration 1007/ 3100 | consumed samples: 2062336 | consumed tokens: 4223664128 | elapsed time per iteration (s): 141.07 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.153085E-01 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.517 | TFLOPs: 148.20 | [default7]: iteration 1008/ 3100 | consumed samples: 2064384 | consumed tokens: 4227858432 | elapsed time per iteration (s): 141.17 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.172530E-01 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.508 | TFLOPs: 148.10 | [default7]: iteration 1009/ 3100 | consumed samples: 2066432 | consumed tokens: 4232052736 | elapsed time per iteration (s): 141.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.146887E-01 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 1010/ 3100 | consumed samples: 2068480 | consumed tokens: 4236247040 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.189872E-01 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.478 | TFLOPs: 147.79 | [default7]: iteration 1011/ 3100 | consumed samples: 2070528 | consumed tokens: 4240441344 | elapsed time per iteration (s): 142.11 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.061271E-01 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.411 | TFLOPs: 147.12 | [default7]: iteration 1012/ 3100 | consumed samples: 2072576 | consumed tokens: 4244635648 | elapsed time per iteration (s): 141.30 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.123008E-01 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.494 | TFLOPs: 147.96 | [default7]: iteration 1013/ 3100 | consumed samples: 2074624 | consumed tokens: 4248829952 | elapsed time per iteration (s): 141.45 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.138465E-01 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.479 | TFLOPs: 147.81 | [default7]: iteration 1014/ 3100 | consumed samples: 2076672 | consumed tokens: 4253024256 | elapsed time per iteration (s): 141.52 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.136845E-01 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.472 | TFLOPs: 147.73 | [default7]: iteration 1015/ 3100 | consumed samples: 2078720 | consumed tokens: 4257218560 | elapsed time per iteration (s): 141.64 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.062279E-01 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.459 | TFLOPs: 147.60 | [default7]: iteration 1016/ 3100 | consumed samples: 2080768 | consumed tokens: 4261412864 | elapsed time per iteration (s): 141.42 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.131301E-01 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.482 | TFLOPs: 147.84 | [default7]: iteration 1017/ 3100 | consumed samples: 2082816 | consumed tokens: 4265607168 | elapsed time per iteration (s): 141.86 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.114852E-01 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.437 | TFLOPs: 147.38 | [default7]: iteration 1018/ 3100 | consumed samples: 2084864 | consumed tokens: 4269801472 | elapsed time per iteration (s): 141.93 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.225891E-01 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.430 | TFLOPs: 147.31 | [default7]: iteration 1019/ 3100 | consumed samples: 2086912 | consumed tokens: 4273995776 | elapsed time per iteration (s): 141.47 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.177025E-01 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.477 | TFLOPs: 147.79 | [default7]: iteration 1020/ 3100 | consumed samples: 2088960 | consumed tokens: 4278190080 | elapsed time per iteration (s): 141.73 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.093725E-01 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.450 | TFLOPs: 147.51 | [default7]: iteration 1021/ 3100 | consumed samples: 2091008 | consumed tokens: 4282384384 | elapsed time per iteration (s): 142.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.063308E-01 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.390 | TFLOPs: 146.90 | [default7]: iteration 1022/ 3100 | consumed samples: 2093056 | consumed tokens: 4286578688 | elapsed time per iteration (s): 141.14 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.136893E-01 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.511 | TFLOPs: 148.13 | [default7]: iteration 1023/ 3100 | consumed samples: 2095104 | consumed tokens: 4290772992 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.141651E-01 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.01 | [default7]: iteration 1024/ 3100 | consumed samples: 2097152 | consumed tokens: 4294967296 | elapsed time per iteration (s): 141.21 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.133502E-01 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.504 | TFLOPs: 148.06 | [default7]: iteration 1025/ 3100 | consumed samples: 2099200 | consumed tokens: 4299161600 | elapsed time per iteration (s): 141.24 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.075629E-01 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.500 | TFLOPs: 148.02 | [default7]: iteration 1026/ 3100 | consumed samples: 2101248 | consumed tokens: 4303355904 | elapsed time per iteration (s): 141.79 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.135466E-01 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.444 | TFLOPs: 147.45 | [default7]: iteration 1027/ 3100 | consumed samples: 2103296 | consumed tokens: 4307550208 | elapsed time per iteration (s): 141.25 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.047476E-01 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.499 | TFLOPs: 148.01 | [default7]: iteration 1028/ 3100 | consumed samples: 2105344 | consumed tokens: 4311744512 | elapsed time per iteration (s): 141.12 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.133985E-01 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.512 | TFLOPs: 148.15 | [default7]: iteration 1029/ 3100 | consumed samples: 2107392 | consumed tokens: 4315938816 | elapsed time per iteration (s): 141.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.144342E-01 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.492 | TFLOPs: 147.94 | [default7]: iteration 1030/ 3100 | consumed samples: 2109440 | consumed tokens: 4320133120 | elapsed time per iteration (s): 142.09 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.153284E-01 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.414 | TFLOPs: 147.14 | [default7]: iteration 1031/ 3100 | consumed samples: 2111488 | consumed tokens: 4324327424 | elapsed time per iteration (s): 141.20 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.038853E-01 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.504 | TFLOPs: 148.06 | [default7]: iteration 1032/ 3100 | consumed samples: 2113536 | consumed tokens: 4328521728 | elapsed time per iteration (s): 141.44 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.006732E-01 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.480 | TFLOPs: 147.82 | [default7]: iteration 1033/ 3100 | consumed samples: 2115584 | consumed tokens: 4332716032 | elapsed time per iteration (s): 141.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.142591E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.492 | TFLOPs: 147.94 | [default7]: iteration 1034/ 3100 | consumed samples: 2117632 | consumed tokens: 4336910336 | elapsed time per iteration (s): 141.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.095613E-01 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.501 | TFLOPs: 148.04 | [default7]: iteration 1035/ 3100 | consumed samples: 2119680 | consumed tokens: 4341104640 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.078567E-01 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.477 | TFLOPs: 147.79 | [default7]: iteration 1036/ 3100 | consumed samples: 2121728 | consumed tokens: 4345298944 | elapsed time per iteration (s): 141.40 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.131544E-01 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.484 | TFLOPs: 147.86 | [default7]: iteration 1037/ 3100 | consumed samples: 2123776 | consumed tokens: 4349493248 | elapsed time per iteration (s): 141.05 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.088702E-01 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.520 | TFLOPs: 148.23 | [default7]: iteration 1038/ 3100 | consumed samples: 2125824 | consumed tokens: 4353687552 | elapsed time per iteration (s): 141.33 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.077198E-01 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.491 | TFLOPs: 147.93 | [default7]: iteration 1039/ 3100 | consumed samples: 2127872 | consumed tokens: 4357881856 | elapsed time per iteration (s): 141.15 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.047386E-01 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.509 | TFLOPs: 148.11 | [default7]: iteration 1040/ 3100 | consumed samples: 2129920 | consumed tokens: 4362076160 | elapsed time per iteration (s): 141.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.109663E-01 | grad norm: 0.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.489 | TFLOPs: 147.91 | [default7]: iteration 1041/ 3100 | consumed samples: 2131968 | consumed tokens: 4366270464 | elapsed time per iteration (s): 141.02 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.107647E-01 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.523 | TFLOPs: 148.26 | [default7]: iteration 1042/ 3100 | consumed samples: 2134016 | consumed tokens: 4370464768 | elapsed time per iteration (s): 141.32 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.090399E-01 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.492 | TFLOPs: 147.94 | [default7]: iteration 1043/ 3100 | consumed samples: 2136064 | consumed tokens: 4374659072 | elapsed time per iteration (s): 142.09 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.107411E-01 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.414 | TFLOPs: 147.14 | [default7]: iteration 1044/ 3100 | consumed samples: 2138112 | consumed tokens: 4378853376 | elapsed time per iteration (s): 140.96 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.107242E-01 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.529 | TFLOPs: 148.32 | [default7]: iteration 1045/ 3100 | consumed samples: 2140160 | consumed tokens: 4383047680 | elapsed time per iteration (s): 141.71 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.046137E-01 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.452 | TFLOPs: 147.53 | [default7]: iteration 1046/ 3100 | consumed samples: 2142208 | consumed tokens: 4387241984 | elapsed time per iteration (s): 141.46 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.119405E-01 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.478 | TFLOPs: 147.79 | [default7]: iteration 1047/ 3100 | consumed samples: 2144256 | consumed tokens: 4391436288 | elapsed time per iteration (s): 141.99 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.046504E-01 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.424 | TFLOPs: 147.24 | [default7]: iteration 1048/ 3100 | consumed samples: 2146304 | consumed tokens: 4395630592 | elapsed time per iteration (s): 140.94 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.086635E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.531 | TFLOPs: 148.34 | [default7]: iteration 1049/ 3100 | consumed samples: 2148352 | consumed tokens: 4399824896 | elapsed time per iteration (s): 141.50 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.086347E-01 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.473 | TFLOPs: 147.75 | [default7]: iteration 1050/ 3100 | consumed samples: 2150400 | consumed tokens: 4404019200 | elapsed time per iteration (s): 141.67 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.020953E-01 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.456 | TFLOPs: 147.57 | [default7]: iteration 1051/ 3100 | consumed samples: 2152448 | consumed tokens: 4408213504 | elapsed time per iteration (s): 141.34 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.014085E-01 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.490 | TFLOPs: 147.92 | [default7]: iteration 1052/ 3100 | consumed samples: 2154496 | consumed tokens: 4412407808 | elapsed time per iteration (s): 141.29 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.075632E-01 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.495 | TFLOPs: 147.97 | [default7]: iteration 1053/ 3100 | consumed samples: 2156544 | consumed tokens: 4416602112 | elapsed time per iteration (s): 142.08 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.017628E-01 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.415 | TFLOPs: 147.15 | [default7]: iteration 1054/ 3100 | consumed samples: 2158592 | consumed tokens: 4420796416 | elapsed time per iteration (s): 141.00 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.050065E-01 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.525 | TFLOPs: 148.28 | [default7]: iteration 1055/ 3100 | consumed samples: 2160640 | consumed tokens: 4424990720 | elapsed time per iteration (s): 141.17 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.040823E-01 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.507 | TFLOPs: 148.09 | [default7]: iteration 1056/ 3100 | consumed samples: 2162688 | consumed tokens: 4429185024 | elapsed time per iteration (s): 142.05 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.158840E-01 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.418 | TFLOPs: 147.18 | [default7]: iteration 1057/ 3100 | consumed samples: 2164736 | consumed tokens: 4433379328 | elapsed time per iteration (s): 141.50 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.122672E-01 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.473 | TFLOPs: 147.75 | [default7]: iteration 1058/ 3100 | consumed samples: 2166784 | consumed tokens: 4437573632 | elapsed time per iteration (s): 141.43 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 4.964427E-01 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.481 | TFLOPs: 147.83 | [default7]: iteration 1059/ 3100 | consumed samples: 2168832 | consumed tokens: 4441767936 | elapsed time per iteration (s): 142.11 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.058709E-01 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.411 | TFLOPs: 147.12 | [default7]: iteration 1060/ 3100 | consumed samples: 2170880 | consumed tokens: 4445962240 | elapsed time per iteration (s): 142.39 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 4.952371E-01 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.383 | TFLOPs: 146.83 | [default7]: iteration 1061/ 3100 | consumed samples: 2172928 | consumed tokens: 4450156544 | elapsed time per iteration (s): 141.35 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.111508E-01 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.488 | TFLOPs: 147.90 | [default7]: iteration 1062/ 3100 | consumed samples: 2174976 | consumed tokens: 4454350848 | elapsed time per iteration (s): 141.11 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 4.982380E-01 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.514 | TFLOPs: 148.16 | [default7]: iteration 1063/ 3100 | consumed samples: 2177024 | consumed tokens: 4458545152 | elapsed time per iteration (s): 142.48 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.019789E-01 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.374 | TFLOPs: 146.74 | [default7]: iteration 1064/ 3100 | consumed samples: 2179072 | consumed tokens: 4462739456 | elapsed time per iteration (s): 141.60 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.050920E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.463 | TFLOPs: 147.65 | [default7]: iteration 1065/ 3100 | consumed samples: 2181120 | consumed tokens: 4466933760 | elapsed time per iteration (s): 141.23 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.012780E-01 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.501 | TFLOPs: 148.03 | [default7]: iteration 1066/ 3100 | consumed samples: 2183168 | consumed tokens: 4471128064 | elapsed time per iteration (s): 142.20 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.049409E-01 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.402 | TFLOPs: 147.02 | [default7]: iteration 1067/ 3100 | consumed samples: 2185216 | consumed tokens: 4475322368 | elapsed time per iteration (s): 141.15 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 5.036104E-01 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.509 | TFLOPs: 148.12 | [default7]: iteration 1068/ 3100 | consumed samples: 2187264 | consumed tokens: 4479516672 | elapsed time per iteration (s): 141.13 | learning rate: 2.000E-05 | global batch size: 2048 | lm loss: 4.958695E-01 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 14.511 | TFLOPs: 148.14 | [default4]:[2022-09-05 14:06:31,806] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_59-model_00-model_states.pt... [default0]:saving checkpoint at iteration 1068 to /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]:[2022-09-05 14:06:31,723] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step1068 is begin to save! [default4]:[2022-09-05 14:06:31,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_37-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_52-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,844] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_25-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,839] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_19-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,806] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_58-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,839] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_18-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_33-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,802] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_29-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_21-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_11-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,850] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_13-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_16-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_48-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_42-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_40-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_50-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_22-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_47-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,832] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_04-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,832] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_28-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_69-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_06-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,844] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_71-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_49-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_09-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,845] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_67-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,839] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_68-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,845] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_35-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_44-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_15-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_63-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,845] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_66-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_12-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_39-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_30-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,852] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_65-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_14-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_36-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,832] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_05-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_20-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,839] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_71_model_states.pt... [default4]:[2022-09-05 14:06:31,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_71_model_states.pt. [default4]:[2022-09-05 14:06:31,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_51-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_45-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_23-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_64-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_70-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_07-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_56-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_54-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,845] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_34-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_60-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_53-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_57-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_26-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,845] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_72-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,844] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_27-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,830] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_03-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_62-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_55-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_24-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,833] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_01-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_08-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_41-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_43-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_46-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_10-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_38-model_00-model_states.pt... [default0]:[2022-09-05 14:06:31,839] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_32-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_17-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_31-model_00-model_states.pt... [default4]:[2022-09-05 14:06:31,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_61-model_00-model_states.pt... [default4]:[2022-09-05 14:06:34,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_29-model_00-model_states.pt. [default4]:[2022-09-05 14:06:34,950] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_27_model_states.pt... [default4]:[2022-09-05 14:06:34,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_27_model_states.pt. [default0]:[2022-09-05 14:06:35,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_22-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,046] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_20_model_states.pt... [default0]:[2022-09-05 14:06:35,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_20_model_states.pt. [default4]:[2022-09-05 14:06:35,060] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_27-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_25_model_states.pt... [default4]:[2022-09-05 14:06:35,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_25_model_states.pt. [default0]:[2022-09-05 14:06:35,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_70-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,128] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_68_model_states.pt... [default0]:[2022-09-05 14:06:35,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_68_model_states.pt. [default0]:[2022-09-05 14:06:35,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_38-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,221] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_36_model_states.pt... [default0]:[2022-09-05 14:06:35,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_36_model_states.pt. [default4]:[2022-09-05 14:06:35,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_19-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,191] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_17_model_states.pt... [default4]:[2022-09-05 14:06:35,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_17_model_states.pt. [default4]:[2022-09-05 14:06:35,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_25-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,245] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_23_model_states.pt... [default4]:[2022-09-05 14:06:35,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_23_model_states.pt. [default0]:[2022-09-05 14:06:35,202] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_28-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,202] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_26_model_states.pt... [default0]:[2022-09-05 14:06:35,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_26_model_states.pt. [default4]:[2022-09-05 14:06:35,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_71-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_69_model_states.pt... [default4]:[2022-09-05 14:06:35,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_69_model_states.pt. [default4]:[2022-09-05 14:06:35,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_51-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_49_model_states.pt... [default4]:[2022-09-05 14:06:35,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_49_model_states.pt. [default4]:[2022-09-05 14:06:35,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_23-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_21_model_states.pt... [default4]:[2022-09-05 14:06:35,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_21_model_states.pt. [default0]:[2022-09-05 14:06:35,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_26-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,230] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_24_model_states.pt... [default0]:[2022-09-05 14:06:35,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_24_model_states.pt. [default0]:[2022-09-05 14:06:35,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_72-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,256] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_74-model_00-model_states.pt... [default0]:[2022-09-05 14:06:35,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_74-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,261] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_70_model_states.pt... [default0]:[2022-09-05 14:06:35,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_70_model_states.pt. [default4]:[2022-09-05 14:06:35,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_03-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_01_model_states.pt... [default4]:[2022-09-05 14:06:35,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_01_model_states.pt. [default0]:[2022-09-05 14:06:35,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_24-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_22_model_states.pt... [default0]:[2022-09-05 14:06:35,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_22_model_states.pt. [default4]:[2022-09-05 14:06:35,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_41-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_39_model_states.pt... [default4]:[2022-09-05 14:06:35,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_39_model_states.pt. [default0]:[2022-09-05 14:06:35,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_18-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,309] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_16_model_states.pt... [default0]:[2022-09-05 14:06:35,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_16_model_states.pt. [default0]:[2022-09-05 14:06:35,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_06-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,363] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_04_model_states.pt... [default4]:[2022-09-05 14:06:35,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_09-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,315] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_07_model_states.pt... [default4]:[2022-09-05 14:06:35,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_07_model_states.pt. [default4]:[2022-09-05 14:06:35,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_67-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,381] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_65_model_states.pt... [default4]:[2022-09-05 14:06:35,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_65-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,290] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_63_model_states.pt... [default4]:[2022-09-05 14:06:35,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_63_model_states.pt. [default4]:[2022-09-05 14:06:35,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_05-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,378] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_03_model_states.pt... [default4]:[2022-09-05 14:06:35,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_03_model_states.pt. [default0]:[2022-09-05 14:06:35,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_64-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_62_model_states.pt... [default0]:[2022-09-05 14:06:35,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_62_model_states.pt. [default0]:[2022-09-05 14:06:35,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_08-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_06_model_states.pt... [default0]:[2022-09-05 14:06:35,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_06_model_states.pt. [default4]:[2022-09-05 14:06:35,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_37-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_35_model_states.pt... [default4]:[2022-09-05 14:06:35,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_35_model_states.pt. [default4]:[2022-09-05 14:06:35,459] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_21-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_19_model_states.pt... [default4]:[2022-09-05 14:06:35,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_13-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,427] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_11_model_states.pt... [default4]:[2022-09-05 14:06:35,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_11_model_states.pt. [default0]:[2022-09-05 14:06:35,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_42-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_40_model_states.pt... [default0]:[2022-09-05 14:06:35,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_40_model_states.pt. [default0]:[2022-09-05 14:06:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_40-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,390] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_38_model_states.pt... [default0]:[2022-09-05 14:06:35,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_38_model_states.pt. [default0]:[2022-09-05 14:06:35,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_04-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_02_model_states.pt... [default0]:[2022-09-05 14:06:35,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_02_model_states.pt. [default0]:[2022-09-05 14:06:35,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_04_model_states.pt. [default4]:[2022-09-05 14:06:35,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_65_model_states.pt. [default4]:[2022-09-05 14:06:35,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_15-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_13_model_states.pt... [default4]:[2022-09-05 14:06:35,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_13_model_states.pt. [default0]:[2022-09-05 14:06:35,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_66-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_64_model_states.pt... [default0]:[2022-09-05 14:06:35,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_64_model_states.pt. [default0]:[2022-09-05 14:06:35,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_12-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_10_model_states.pt... [default0]:[2022-09-05 14:06:35,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_10_model_states.pt. [default4]:[2022-09-05 14:06:35,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_39-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,397] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_37_model_states.pt... [default4]:[2022-09-05 14:06:35,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_37_model_states.pt. [default0]:[2022-09-05 14:06:35,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_14-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_12_model_states.pt... [default0]:[2022-09-05 14:06:35,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_12_model_states.pt. [default0]:[2022-09-05 14:06:35,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_36-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_34_model_states.pt... [default0]:[2022-09-05 14:06:35,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_34_model_states.pt. [default4]:[2022-09-05 14:06:35,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_45-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_43_model_states.pt... [default4]:[2022-09-05 14:06:35,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_43_model_states.pt. [default0]:[2022-09-05 14:06:35,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_54-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,482] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_52_model_states.pt... [default0]:[2022-09-05 14:06:35,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_52_model_states.pt. [default4]:[2022-09-05 14:06:35,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_55-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_53_model_states.pt... [default4]:[2022-09-05 14:06:35,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_53_model_states.pt. [default4]:[2022-09-05 14:06:35,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_43-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,525] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_41_model_states.pt... [default4]:[2022-09-05 14:06:35,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_41_model_states.pt. [default4]:[2022-09-05 14:06:35,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_19_model_states.pt. [default0]:[2022-09-05 14:06:35,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_50-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,522] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_48_model_states.pt... [default0]:[2022-09-05 14:06:35,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_48_model_states.pt. [default0]:[2022-09-05 14:06:35,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_68-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,542] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_66_model_states.pt... [default0]:[2022-09-05 14:06:35,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_66_model_states.pt. [default0]:[2022-09-05 14:06:35,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_44-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_42_model_states.pt... [default0]:[2022-09-05 14:06:35,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_42_model_states.pt. [default4]:[2022-09-05 14:06:35,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_63-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,554] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_61_model_states.pt... [default4]:[2022-09-05 14:06:35,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_61_model_states.pt. [default0]:[2022-09-05 14:06:35,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_20-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_18_model_states.pt... [default0]:[2022-09-05 14:06:35,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_18_model_states.pt. [default4]:[2022-09-05 14:06:35,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_07-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_05_model_states.pt... [default4]:[2022-09-05 14:06:35,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_05_model_states.pt. [default0]:[2022-09-05 14:06:35,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_34-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_32_model_states.pt... [default4]:[2022-09-05 14:06:35,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_53-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_51_model_states.pt... [default4]:[2022-09-05 14:06:35,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_51_model_states.pt. [default0]:[2022-09-05 14:06:35,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_62-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,538] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_60_model_states.pt... [default0]:[2022-09-05 14:06:35,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_60_model_states.pt. [default0]:[2022-09-05 14:06:35,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_46-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_44_model_states.pt... [default0]:[2022-09-05 14:06:35,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_44_model_states.pt. [default4]:[2022-09-05 14:06:35,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_31-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,549] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_29_model_states.pt... [default4]:[2022-09-05 14:06:35,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_29_model_states.pt. [default4]:[2022-09-05 14:06:35,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_61-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,583] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_59_model_states.pt... [default4]:[2022-09-05 14:06:35,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_59_model_states.pt. [default0]:[2022-09-05 14:06:35,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_52-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_50_model_states.pt... [default0]:[2022-09-05 14:06:35,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_50_model_states.pt. [default0]:[2022-09-05 14:06:35,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_58-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,651] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_56_model_states.pt... [default4]:[2022-09-05 14:06:35,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_33-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_31_model_states.pt... [default4]:[2022-09-05 14:06:35,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_31_model_states.pt. [default0]:[2022-09-05 14:06:35,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_48-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_46_model_states.pt... [default0]:[2022-09-05 14:06:35,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_46_model_states.pt. [default4]:[2022-09-05 14:06:35,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_47-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,659] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_45_model_states.pt... [default4]:[2022-09-05 14:06:35,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_45_model_states.pt. [default4]:[2022-09-05 14:06:35,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_69-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_67_model_states.pt... [default4]:[2022-09-05 14:06:35,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_67_model_states.pt. [default4]:[2022-09-05 14:06:35,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_49-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,647] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_47_model_states.pt... [default4]:[2022-09-05 14:06:35,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_47_model_states.pt. [default4]:[2022-09-05 14:06:35,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_35-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,641] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_33_model_states.pt... [default4]:[2022-09-05 14:06:35,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_33_model_states.pt. [default0]:[2022-09-05 14:06:35,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_30-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,648] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_28_model_states.pt... [default0]:[2022-09-05 14:06:35,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_28_model_states.pt. [default0]:[2022-09-05 14:06:35,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_56-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,670] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_54_model_states.pt... [default0]:[2022-09-05 14:06:35,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_54_model_states.pt. [default0]:[2022-09-05 14:06:35,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_32_model_states.pt. [default0]:[2022-09-05 14:06:35,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_60-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,674] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_58_model_states.pt... [default0]:[2022-09-05 14:06:35,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_58_model_states.pt. [default4]:[2022-09-05 14:06:35,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_57-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,623] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_55_model_states.pt... [default4]:[2022-09-05 14:06:35,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_55_model_states.pt. [default4]:[2022-09-05 14:06:35,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_59-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,689] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_57_model_states.pt... [default4]:[2022-09-05 14:06:35,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_57_model_states.pt. [default0]:[2022-09-05 14:06:35,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_10-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,700] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_08_model_states.pt... [default0]:[2022-09-05 14:06:35,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_08_model_states.pt. [default0]:[2022-09-05 14:06:35,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_32-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,678] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_30_model_states.pt... [default0]:[2022-09-05 14:06:35,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_30_model_states.pt. [default4]:[2022-09-05 14:06:35,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_17-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,712] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_15_model_states.pt... [default4]:[2022-09-05 14:06:35,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_15_model_states.pt. [default0]:[2022-09-05 14:06:35,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_56_model_states.pt. [default4]:[2022-09-05 14:06:35,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_11-model_00-model_states.pt. [default4]:[2022-09-05 14:06:35,689] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_09_model_states.pt... [default4]:[2022-09-05 14:06:35,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_09_model_states.pt. [default0]:[2022-09-05 14:06:35,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_16-model_00-model_states.pt. [default0]:[2022-09-05 14:06:35,702] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_14_model_states.pt... [default0]:[2022-09-05 14:06:35,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_14_model_states.pt. [default0]:[2022-09-05 14:06:36,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/layer_01-model_00-model_states.pt. [default0]:[2022-09-05 14:06:36,449] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_00_model_states.pt [default0]:[2022-09-05 14:06:36,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:06:36,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt... [default1]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt... [default4]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt... [default3]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt... [default2]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt... [default0]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt... [default5]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt... [default7]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt... [default6]:[2022-09-05 14:06:36,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt... [default4]:[2022-09-05 14:06:42,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt. [default4]:[2022-09-05 14:06:42,585] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_25_optim_states.pt [default7]:[2022-09-05 14:06:43,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt. [default7]:[2022-09-05 14:06:43,434] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_25_optim_states.pt [default3]:[2022-09-05 14:06:43,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt. [default3]:[2022-09-05 14:06:43,433] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_24_optim_states.pt [default2]:[2022-09-05 14:06:43,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt. [default2]:[2022-09-05 14:06:43,606] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_24_optim_states.pt [default0]:[2022-09-05 14:06:43,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt. [default0]:[2022-09-05 14:06:43,644] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_40_optim_states.pt [default0]:[2022-09-05 14:06:43,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt. [default0]:[2022-09-05 14:06:43,735] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_44_optim_states.pt [default5]:[2022-09-05 14:06:43,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt. [default5]:[2022-09-05 14:06:43,707] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_25_optim_states.pt [default2]:[2022-09-05 14:06:43,757] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt. [default2]:[2022-09-05 14:06:43,757] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_04_optim_states.pt [default0]:[2022-09-05 14:06:43,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt. [default0]:[2022-09-05 14:06:43,710] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_04_optim_states.pt [default3]:[2022-09-05 14:06:44,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt. [default3]:[2022-09-05 14:06:44,036] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_40_optim_states.pt [default3]:[2022-09-05 14:06:44,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt. [default3]:[2022-09-05 14:06:44,136] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_44_optim_states.pt [default1]:[2022-09-05 14:06:44,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt. [default1]:[2022-09-05 14:06:44,124] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_04_optim_states.pt [default3]:[2022-09-05 14:06:44,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt. [default3]:[2022-09-05 14:06:44,196] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_60_optim_states.pt [default1]:[2022-09-05 14:06:44,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt. [default1]:[2022-09-05 14:06:44,243] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_24_optim_states.pt [default0]:[2022-09-05 14:06:44,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt. [default0]:[2022-09-05 14:06:44,213] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_18_optim_states.pt [default1]:[2022-09-05 14:06:44,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt. [default1]:[2022-09-05 14:06:44,306] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_48_optim_states.pt [default3]:[2022-09-05 14:06:44,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt. [default3]:[2022-09-05 14:06:44,275] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_04_optim_states.pt [default5]:[2022-09-05 14:06:44,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt. [default5]:[2022-09-05 14:06:44,268] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_05_optim_states.pt [default4]:[2022-09-05 14:06:44,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt. [default4]:[2022-09-05 14:06:44,373] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_45_optim_states.pt [default7]:[2022-09-05 14:06:44,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt. [default7]:[2022-09-05 14:06:44,326] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_45_optim_states.pt [default1]:[2022-09-05 14:06:44,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt. [default1]:[2022-09-05 14:06:44,297] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_14_optim_states.pt [default4]:[2022-09-05 14:06:44,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt. [default4]:[2022-09-05 14:06:44,423] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_05_optim_states.pt [default0]:[2022-09-05 14:06:44,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt. [default0]:[2022-09-05 14:06:44,338] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_24_optim_states.pt [default6]:[2022-09-05 14:06:44,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt. [default6]:[2022-09-05 14:06:44,416] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_25_optim_states.pt [default0]:[2022-09-05 14:06:44,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt. [default0]:[2022-09-05 14:06:44,395] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_58_optim_states.pt [default2]:[2022-09-05 14:06:44,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt. [default2]:[2022-09-05 14:06:44,530] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_48_optim_states.pt [default1]:[2022-09-05 14:06:44,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt. [default1]:[2022-09-05 14:06:44,487] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_20_optim_states.pt [default0]:[2022-09-05 14:06:44,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt. [default0]:[2022-09-05 14:06:44,543] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_14_optim_states.pt [default3]:[2022-09-05 14:06:44,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt. [default3]:[2022-09-05 14:06:44,521] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_58_optim_states.pt [default3]:[2022-09-05 14:06:44,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt. [default3]:[2022-09-05 14:06:44,549] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_14_optim_states.pt [default2]:[2022-09-05 14:06:44,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt. [default2]:[2022-09-05 14:06:44,554] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_66_optim_states.pt [default1]:[2022-09-05 14:06:44,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt. [default1]:[2022-09-05 14:06:44,609] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_58_optim_states.pt [default0]:[2022-09-05 14:06:44,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt. [default0]:[2022-09-05 14:06:44,592] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_36_optim_states.pt [default5]:[2022-09-05 14:06:44,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt. [default5]:[2022-09-05 14:06:44,716] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_49_optim_states.pt [default0]:[2022-09-05 14:06:44,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt. [default0]:[2022-09-05 14:06:44,706] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_48_optim_states.pt [default4]:[2022-09-05 14:06:44,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt. [default4]:[2022-09-05 14:06:44,760] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_11_optim_states.pt [default0]:[2022-09-05 14:06:44,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt. [default0]:[2022-09-05 14:06:44,741] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_20_optim_states.pt [default1]:[2022-09-05 14:06:44,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt. [default1]:[2022-09-05 14:06:44,790] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_40_optim_states.pt [default4]:[2022-09-05 14:06:44,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt. [default4]:[2022-09-05 14:06:44,762] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_61_optim_states.pt [default0]:[2022-09-05 14:06:44,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt. [default0]:[2022-09-05 14:06:44,729] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_28_optim_states.pt [default5]:[2022-09-05 14:06:44,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt. [default5]:[2022-09-05 14:06:44,822] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_67_optim_states.pt [default5]:[2022-09-05 14:06:44,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt. [default5]:[2022-09-05 14:06:44,881] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_51_optim_states.pt [default0]:[2022-09-05 14:06:44,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt. [default0]:[2022-09-05 14:06:44,803] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_66_optim_states.pt [default6]:[2022-09-05 14:06:44,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt. [default6]:[2022-09-05 14:06:44,901] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_43_optim_states.pt [default2]:[2022-09-05 14:06:44,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt. [default2]:[2022-09-05 14:06:44,933] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_58_optim_states.pt [default6]:[2022-09-05 14:06:44,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt. [default6]:[2022-09-05 14:06:44,943] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_45_optim_states.pt [default1]:[2022-09-05 14:06:44,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt. [default1]:[2022-09-05 14:06:44,970] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_66_optim_states.pt [default4]:[2022-09-05 14:06:44,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt. [default4]:[2022-09-05 14:06:44,974] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_65_optim_states.pt [default7]:[2022-09-05 14:06:44,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt. [default7]:[2022-09-05 14:06:44,953] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_15_optim_states.pt [default5]:[2022-09-05 14:06:44,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt. [default5]:[2022-09-05 14:06:44,965] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_15_optim_states.pt [default1]:[2022-09-05 14:06:44,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt. [default1]:[2022-09-05 14:06:44,984] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_18_optim_states.pt [default3]:[2022-09-05 14:06:44,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt. [default3]:[2022-09-05 14:06:44,994] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_10_optim_states.pt [default4]:[2022-09-05 14:06:44,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt. [default4]:[2022-09-05 14:06:44,956] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_41_optim_states.pt [default4]:[2022-09-05 14:06:45,080] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt. [default4]:[2022-09-05 14:06:45,080] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_69_optim_states.pt [default6]:[2022-09-05 14:06:45,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt. [default6]:[2022-09-05 14:06:45,023] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_41_optim_states.pt [default3]:[2022-09-05 14:06:45,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt. [default3]:[2022-09-05 14:06:45,041] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_50_optim_states.pt [default2]:[2022-09-05 14:06:45,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt. [default2]:[2022-09-05 14:06:45,092] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_10_optim_states.pt [default0]:[2022-09-05 14:06:44,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt. [default0]:[2022-09-05 14:06:44,998] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_64_optim_states.pt [default2]:[2022-09-05 14:06:45,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt. [default2]:[2022-09-05 14:06:45,071] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_60_optim_states.pt [default2]:[2022-09-05 14:06:45,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt. [default2]:[2022-09-05 14:06:45,100] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_44_optim_states.pt [default4]:[2022-09-05 14:06:45,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt. [default4]:[2022-09-05 14:06:45,103] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_49_optim_states.pt [default4]:[2022-09-05 14:06:45,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt. [default4]:[2022-09-05 14:06:45,097] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_21_optim_states.pt [default2]:[2022-09-05 14:06:45,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt. [default2]:[2022-09-05 14:06:45,087] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_28_optim_states.pt [default3]:[2022-09-05 14:06:45,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt. [default3]:[2022-09-05 14:06:45,162] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_66_optim_states.pt [default6]:[2022-09-05 14:06:45,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt. [default6]:[2022-09-05 14:06:45,103] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_15_optim_states.pt [default6]:[2022-09-05 14:06:45,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt. [default6]:[2022-09-05 14:06:45,140] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_11_optim_states.pt [default7]:[2022-09-05 14:06:45,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt. [default7]:[2022-09-05 14:06:45,170] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_41_optim_states.pt [default0]:[2022-09-05 14:06:45,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt. [default0]:[2022-09-05 14:06:45,274] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_10_optim_states.pt [default2]:[2022-09-05 14:06:45,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt. [default2]:[2022-09-05 14:06:45,245] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_20_optim_states.pt [default4]:[2022-09-05 14:06:45,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt. [default4]:[2022-09-05 14:06:45,305] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_43_optim_states.pt [default7]:[2022-09-05 14:06:45,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt. [default7]:[2022-09-05 14:06:45,290] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_59_optim_states.pt [default2]:[2022-09-05 14:06:45,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt. [default2]:[2022-09-05 14:06:45,282] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_40_optim_states.pt [default6]:[2022-09-05 14:06:45,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt. [default6]:[2022-09-05 14:06:45,322] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_67_optim_states.pt [default4]:[2022-09-05 14:06:45,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt. [default4]:[2022-09-05 14:06:45,254] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_15_optim_states.pt [default5]:[2022-09-05 14:06:45,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt. [default5]:[2022-09-05 14:06:45,264] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_59_optim_states.pt [default2]:[2022-09-05 14:06:45,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt. [default2]:[2022-09-05 14:06:45,368] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_18_optim_states.pt [default7]:[2022-09-05 14:06:45,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt. [default7]:[2022-09-05 14:06:45,366] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_67_optim_states.pt [default0]:[2022-09-05 14:06:45,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt. [default0]:[2022-09-05 14:06:45,427] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_32_optim_states.pt [default5]:[2022-09-05 14:06:45,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt. [default5]:[2022-09-05 14:06:45,413] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_41_optim_states.pt [default1]:[2022-09-05 14:06:45,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt. [default1]:[2022-09-05 14:06:45,434] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_60_optim_states.pt [default7]:[2022-09-05 14:06:45,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt. [default7]:[2022-09-05 14:06:45,422] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_21_optim_states.pt [default6]:[2022-09-05 14:06:45,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt. [default6]:[2022-09-05 14:06:45,450] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_59_optim_states.pt [default6]:[2022-09-05 14:06:45,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt. [default6]:[2022-09-05 14:06:45,453] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_05_optim_states.pt [default1]:[2022-09-05 14:06:45,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt. [default1]:[2022-09-05 14:06:45,416] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_44_optim_states.pt [default0]:[2022-09-05 14:06:45,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt. [default0]:[2022-09-05 14:06:45,399] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_46_optim_states.pt [default3]:[2022-09-05 14:06:45,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt. [default3]:[2022-09-05 14:06:45,390] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_20_optim_states.pt [default4]:[2022-09-05 14:06:45,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt. [default4]:[2022-09-05 14:06:45,466] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_67_optim_states.pt [default2]:[2022-09-05 14:06:45,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt. [default2]:[2022-09-05 14:06:45,481] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_34_optim_states.pt [default6]:[2022-09-05 14:06:45,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt. [default6]:[2022-09-05 14:06:45,487] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_49_optim_states.pt [default7]:[2022-09-05 14:06:45,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt. [default7]:[2022-09-05 14:06:45,422] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_43_optim_states.pt [default6]:[2022-09-05 14:06:45,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt. [default6]:[2022-09-05 14:06:45,483] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_09_optim_states.pt [default1]:[2022-09-05 14:06:45,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt. [default1]:[2022-09-05 14:06:45,452] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_10_optim_states.pt [default6]:[2022-09-05 14:06:45,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt. [default6]:[2022-09-05 14:06:45,420] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_21_optim_states.pt [default6]:[2022-09-05 14:06:45,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt. [default6]:[2022-09-05 14:06:45,461] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_65_optim_states.pt [default0]:[2022-09-05 14:06:45,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt. [default0]:[2022-09-05 14:06:45,490] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_62_optim_states.pt [default4]:[2022-09-05 14:06:45,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt. [default4]:[2022-09-05 14:06:45,453] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_51_optim_states.pt [default4]:[2022-09-05 14:06:45,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt. [default4]:[2022-09-05 14:06:45,474] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_29_optim_states.pt [default5]:[2022-09-05 14:06:45,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt. [default5]:[2022-09-05 14:06:45,552] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_43_optim_states.pt [default7]:[2022-09-05 14:06:45,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt. [default7]:[2022-09-05 14:06:45,478] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_51_optim_states.pt [default5]:[2022-09-05 14:06:45,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt. [default5]:[2022-09-05 14:06:45,529] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_45_optim_states.pt [default3]:[2022-09-05 14:06:45,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt. [default3]:[2022-09-05 14:06:45,514] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_64_optim_states.pt [default0]:[2022-09-05 14:06:45,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt. [default0]:[2022-09-05 14:06:45,608] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_34_optim_states.pt [default3]:[2022-09-05 14:06:45,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt. [default3]:[2022-09-05 14:06:45,563] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_48_optim_states.pt [default7]:[2022-09-05 14:06:45,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt. [default7]:[2022-09-05 14:06:45,607] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_05_optim_states.pt [default5]:[2022-09-05 14:06:45,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt. [default5]:[2022-09-05 14:06:45,574] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_21_optim_states.pt [default3]:[2022-09-05 14:06:45,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt. [default3]:[2022-09-05 14:06:45,661] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_42_optim_states.pt [default7]:[2022-09-05 14:06:45,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt. [default7]:[2022-09-05 14:06:45,667] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_37_optim_states.pt [default4]:[2022-09-05 14:06:45,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt. [default4]:[2022-09-05 14:06:45,638] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_63_optim_states.pt [default1]:[2022-09-05 14:06:45,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt. [default1]:[2022-09-05 14:06:45,636] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_62_optim_states.pt [default6]:[2022-09-05 14:06:45,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt. [default6]:[2022-09-05 14:06:45,725] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_51_optim_states.pt [default1]:[2022-09-05 14:06:45,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt. [default1]:[2022-09-05 14:06:45,671] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_50_optim_states.pt [default3]:[2022-09-05 14:06:45,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt. [default3]:[2022-09-05 14:06:45,698] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_34_optim_states.pt [default4]:[2022-09-05 14:06:45,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt. [default4]:[2022-09-05 14:06:45,754] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_59_optim_states.pt [default7]:[2022-09-05 14:06:45,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt. [default7]:[2022-09-05 14:06:45,716] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_09_optim_states.pt [default5]:[2022-09-05 14:06:45,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt. [default5]:[2022-09-05 14:06:45,700] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_11_optim_states.pt [default2]:[2022-09-05 14:06:45,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt. [default2]:[2022-09-05 14:06:45,740] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_50_optim_states.pt [default4]:[2022-09-05 14:06:45,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt. [default4]:[2022-09-05 14:06:45,732] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_33_optim_states.pt [default6]:[2022-09-05 14:06:45,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt. [default6]:[2022-09-05 14:06:45,752] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_47_optim_states.pt [default6]:[2022-09-05 14:06:45,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt. [default6]:[2022-09-05 14:06:45,821] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_35_optim_states.pt [default7]:[2022-09-05 14:06:45,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt. [default7]:[2022-09-05 14:06:45,767] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_29_optim_states.pt [default4]:[2022-09-05 14:06:45,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt. [default4]:[2022-09-05 14:06:45,828] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_09_optim_states.pt [default6]:[2022-09-05 14:06:45,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt. [default6]:[2022-09-05 14:06:45,822] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_61_optim_states.pt [default5]:[2022-09-05 14:06:45,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt. [default5]:[2022-09-05 14:06:45,879] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_69_optim_states.pt [default2]:[2022-09-05 14:06:45,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt. [default2]:[2022-09-05 14:06:45,881] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_42_optim_states.pt [default5]:[2022-09-05 14:06:45,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt. [default5]:[2022-09-05 14:06:45,830] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_33_optim_states.pt [default7]:[2022-09-05 14:06:45,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt. [default7]:[2022-09-05 14:06:45,844] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_49_optim_states.pt [default0]:[2022-09-05 14:06:45,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt. [default0]:[2022-09-05 14:06:45,819] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_42_optim_states.pt [default5]:[2022-09-05 14:06:45,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt. [default5]:[2022-09-05 14:06:45,901] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_61_optim_states.pt [default2]:[2022-09-05 14:06:45,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt. [default2]:[2022-09-05 14:06:45,812] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_62_optim_states.pt [default0]:[2022-09-05 14:06:45,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt. [default0]:[2022-09-05 14:06:45,881] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_22_optim_states.pt [default4]:[2022-09-05 14:06:45,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt. [default4]:[2022-09-05 14:06:45,941] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_35_optim_states.pt [default7]:[2022-09-05 14:06:45,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt. [default7]:[2022-09-05 14:06:45,946] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_19_optim_states.pt [default5]:[2022-09-05 14:06:45,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt. [default5]:[2022-09-05 14:06:45,920] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_35_optim_states.pt [default5]:[2022-09-05 14:06:45,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt. [default5]:[2022-09-05 14:06:45,989] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_29_optim_states.pt [default5]:[2022-09-05 14:06:45,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt. [default5]:[2022-09-05 14:06:45,918] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_09_optim_states.pt [default4]:[2022-09-05 14:06:45,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt. [default4]:[2022-09-05 14:06:45,914] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_37_optim_states.pt [default2]:[2022-09-05 14:06:45,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt. [default2]:[2022-09-05 14:06:45,942] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_22_optim_states.pt [default1]:[2022-09-05 14:06:45,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt. [default1]:[2022-09-05 14:06:45,939] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_64_optim_states.pt [default6]:[2022-09-05 14:06:45,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt. [default6]:[2022-09-05 14:06:45,964] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_23_optim_states.pt [default0]:[2022-09-05 14:06:45,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt. [default0]:[2022-09-05 14:06:45,947] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_60_optim_states.pt [default5]:[2022-09-05 14:06:45,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt. [default5]:[2022-09-05 14:06:45,999] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_65_optim_states.pt [default2]:[2022-09-05 14:06:46,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt. [default2]:[2022-09-05 14:06:46,056] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_08_optim_states.pt [default4]:[2022-09-05 14:06:46,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt. [default4]:[2022-09-05 14:06:46,048] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_47_optim_states.pt [default1]:[2022-09-05 14:06:46,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt. [default1]:[2022-09-05 14:06:46,076] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_46_optim_states.pt [default3]:[2022-09-05 14:06:45,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt. [default3]:[2022-09-05 14:06:45,999] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_28_optim_states.pt [default1]:[2022-09-05 14:06:46,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt. [default1]:[2022-09-05 14:06:46,052] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_36_optim_states.pt [default2]:[2022-09-05 14:06:46,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt. [default2]:[2022-09-05 14:06:46,099] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_64_optim_states.pt [default7]:[2022-09-05 14:06:46,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt. [default7]:[2022-09-05 14:06:46,007] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_61_optim_states.pt [default3]:[2022-09-05 14:06:46,066] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt. [default3]:[2022-09-05 14:06:46,066] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_18_optim_states.pt [default6]:[2022-09-05 14:06:46,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt. [default6]:[2022-09-05 14:06:46,045] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_29_optim_states.pt [default1]:[2022-09-05 14:06:46,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt. [default1]:[2022-09-05 14:06:46,084] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_34_optim_states.pt [default7]:[2022-09-05 14:06:46,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt. [default7]:[2022-09-05 14:06:46,057] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_35_optim_states.pt [default0]:[2022-09-05 14:06:46,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt. [default0]:[2022-09-05 14:06:46,087] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_50_optim_states.pt [default4]:[2022-09-05 14:06:46,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt. [default4]:[2022-09-05 14:06:46,165] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_19_optim_states.pt [default6]:[2022-09-05 14:06:46,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt. [default6]:[2022-09-05 14:06:46,156] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_69_optim_states.pt [default3]:[2022-09-05 14:06:46,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt. [default3]:[2022-09-05 14:06:46,114] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_46_optim_states.pt [default7]:[2022-09-05 14:06:46,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt. [default7]:[2022-09-05 14:06:46,136] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_69_optim_states.pt [default7]:[2022-09-05 14:06:46,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt. [default7]:[2022-09-05 14:06:46,142] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_63_optim_states.pt [default7]:[2022-09-05 14:06:46,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt. [default7]:[2022-09-05 14:06:46,134] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_11_optim_states.pt [default6]:[2022-09-05 14:06:46,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt. [default6]:[2022-09-05 14:06:46,130] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_19_optim_states.pt [default5]:[2022-09-05 14:06:46,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt. [default5]:[2022-09-05 14:06:46,197] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_19_optim_states.pt [default0]:[2022-09-05 14:06:46,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt. [default0]:[2022-09-05 14:06:46,158] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_08_optim_states.pt [default6]:[2022-09-05 14:06:46,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt. [default6]:[2022-09-05 14:06:46,267] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_37_optim_states.pt [default1]:[2022-09-05 14:06:46,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt. [default1]:[2022-09-05 14:06:46,279] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_28_optim_states.pt [default7]:[2022-09-05 14:06:46,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt. [default7]:[2022-09-05 14:06:46,261] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_23_optim_states.pt [default5]:[2022-09-05 14:06:46,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt. [default5]:[2022-09-05 14:06:46,218] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_63_optim_states.pt [default1]:[2022-09-05 14:06:46,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt. [default1]:[2022-09-05 14:06:46,216] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_08_optim_states.pt [default3]:[2022-09-05 14:06:46,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt. [default3]:[2022-09-05 14:06:46,275] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_36_optim_states.pt [default3]:[2022-09-05 14:06:46,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt. [default3]:[2022-09-05 14:06:46,381] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_22_optim_states.pt [default6]:[2022-09-05 14:06:46,372] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt. [default6]:[2022-09-05 14:06:46,372] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_63_optim_states.pt [default3]:[2022-09-05 14:06:46,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt. [default3]:[2022-09-05 14:06:46,406] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_62_optim_states.pt [default1]:[2022-09-05 14:06:46,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt. [default1]:[2022-09-05 14:06:46,439] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_22_optim_states.pt [default5]:[2022-09-05 14:06:46,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt. [default5]:[2022-09-05 14:06:46,366] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_37_optim_states.pt [default7]:[2022-09-05 14:06:46,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt. [default7]:[2022-09-05 14:06:46,399] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_65_optim_states.pt [default7]:[2022-09-05 14:06:46,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt. [default7]:[2022-09-05 14:06:46,490] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_33_optim_states.pt [default5]:[2022-09-05 14:06:46,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt. [default5]:[2022-09-05 14:06:46,440] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_23_optim_states.pt [default5]:[2022-09-05 14:06:46,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt. [default5]:[2022-09-05 14:06:46,506] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_47_optim_states.pt [default3]:[2022-09-05 14:06:46,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt. [default3]:[2022-09-05 14:06:46,505] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_32_optim_states.pt [default2]:[2022-09-05 14:06:46,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt. [default2]:[2022-09-05 14:06:46,446] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_36_optim_states.pt [default7]:[2022-09-05 14:06:46,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt. [default7]:[2022-09-05 14:06:46,514] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_47_optim_states.pt [default2]:[2022-09-05 14:06:46,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt. [default2]:[2022-09-05 14:06:46,613] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_68_optim_states.pt [default0]:[2022-09-05 14:06:46,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt. [default0]:[2022-09-05 14:06:46,724] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_68_optim_states.pt [default2]:[2022-09-05 14:06:46,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt. [default2]:[2022-09-05 14:06:46,721] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_32_optim_states.pt [default6]:[2022-09-05 14:06:46,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt. [default6]:[2022-09-05 14:06:46,859] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_33_optim_states.pt [default1]:[2022-09-05 14:06:46,922] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt. [default1]:[2022-09-05 14:06:46,923] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_68_optim_states.pt [default3]:[2022-09-05 14:06:46,903] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt. [default3]:[2022-09-05 14:06:46,903] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_68_optim_states.pt [default2]:[2022-09-05 14:06:47,077] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt. [default2]:[2022-09-05 14:06:47,077] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_56_optim_states.pt [default1]:[2022-09-05 14:06:47,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt. [default1]:[2022-09-05 14:06:47,205] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_32_optim_states.pt [default4]:[2022-09-05 14:06:47,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt. [default4]:[2022-09-05 14:06:47,353] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_23_optim_states.pt [default4]:[2022-09-05 14:06:47,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt. [default4]:[2022-09-05 14:06:47,552] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_07_optim_states.pt [default5]:[2022-09-05 14:06:47,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt. [default5]:[2022-09-05 14:06:47,594] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_53_optim_states.pt [default3]:[2022-09-05 14:06:47,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt. [default3]:[2022-09-05 14:06:47,592] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_06_optim_states.pt [default6]:[2022-09-05 14:06:47,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt. [default6]:[2022-09-05 14:06:47,785] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_07_optim_states.pt [default2]:[2022-09-05 14:06:47,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt. [default2]:[2022-09-05 14:06:47,796] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_06_optim_states.pt [default2]:[2022-09-05 14:06:48,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt. [default2]:[2022-09-05 14:06:48,214] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_70_optim_states.pt [default5]:[2022-09-05 14:06:48,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt. [default5]:[2022-09-05 14:06:48,355] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_07_optim_states.pt [default0]:[2022-09-05 14:06:48,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt. [default0]:[2022-09-05 14:06:48,404] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_70_optim_states.pt [default7]:[2022-09-05 14:06:48,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt. [default7]:[2022-09-05 14:06:48,424] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_07_optim_states.pt [default3]:[2022-09-05 14:06:48,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt. [default3]:[2022-09-05 14:06:48,587] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_56_optim_states.pt [default1]:[2022-09-05 14:06:48,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt. [default1]:[2022-09-05 14:06:48,630] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_06_optim_states.pt [default0]:[2022-09-05 14:06:48,922] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt. [default0]:[2022-09-05 14:06:48,922] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_06_optim_states.pt [default0]:[2022-09-05 14:06:48,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt. [default0]:[2022-09-05 14:06:48,935] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_56_optim_states.pt [default3]:[2022-09-05 14:06:49,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt. [default3]:[2022-09-05 14:06:49,239] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_70_optim_states.pt [default5]:[2022-09-05 14:06:49,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt. [default5]:[2022-09-05 14:06:49,295] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_57_optim_states.pt [default3]:[2022-09-05 14:06:49,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt. [default3]:[2022-09-05 14:06:49,359] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_08_optim_states.pt [default1]:[2022-09-05 14:06:49,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt. [default1]:[2022-09-05 14:06:49,345] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_70_optim_states.pt [default4]:[2022-09-05 14:06:49,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt. [default4]:[2022-09-05 14:06:49,397] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_53_optim_states.pt [default4]:[2022-09-05 14:06:49,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt. [default4]:[2022-09-05 14:06:49,376] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_57_optim_states.pt [default7]:[2022-09-05 14:06:49,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt. [default7]:[2022-09-05 14:06:49,448] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_57_optim_states.pt [default6]:[2022-09-05 14:06:49,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt. [default6]:[2022-09-05 14:06:49,588] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_39_optim_states.pt [default1]:[2022-09-05 14:06:49,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt. [default1]:[2022-09-05 14:06:49,577] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_56_optim_states.pt [default1]:[2022-09-05 14:06:49,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt. [default1]:[2022-09-05 14:06:49,795] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_42_optim_states.pt [default2]:[2022-09-05 14:06:49,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt. [default2]:[2022-09-05 14:06:49,977] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_38_optim_states.pt [default2]:[2022-09-05 14:06:49,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt. [default2]:[2022-09-05 14:06:49,937] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_46_optim_states.pt [default2]:[2022-09-05 14:06:50,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt. [default2]:[2022-09-05 14:06:50,234] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_14_optim_states.pt [default1]:[2022-09-05 14:06:50,301] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt. [default1]:[2022-09-05 14:06:50,301] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_54_optim_states.pt [default2]:[2022-09-05 14:06:50,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt. [default2]:[2022-09-05 14:06:50,685] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_26_optim_states.pt [default7]:[2022-09-05 14:06:50,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt. [default7]:[2022-09-05 14:06:50,639] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_39_optim_states.pt [default0]:[2022-09-05 14:06:50,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt. [default0]:[2022-09-05 14:06:50,651] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_54_optim_states.pt [default7]:[2022-09-05 14:06:50,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt. [default7]:[2022-09-05 14:06:50,742] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_27_optim_states.pt [default3]:[2022-09-05 14:06:50,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt. [default3]:[2022-09-05 14:06:50,842] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_26_optim_states.pt [default6]:[2022-09-05 14:06:50,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt. [default6]:[2022-09-05 14:06:50,947] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_57_optim_states.pt [default6]:[2022-09-05 14:06:51,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt. [default6]:[2022-09-05 14:06:51,030] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_27_optim_states.pt [default0]:[2022-09-05 14:06:51,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt. [default0]:[2022-09-05 14:06:51,008] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_26_optim_states.pt [default7]:[2022-09-05 14:06:51,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt. [default7]:[2022-09-05 14:06:51,158] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_71_optim_states.pt [default7]:[2022-09-05 14:06:51,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt. [default7]:[2022-09-05 14:06:51,417] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_31_optim_states.pt [default4]:[2022-09-05 14:06:51,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt. [default4]:[2022-09-05 14:06:51,439] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_71_optim_states.pt [default2]:[2022-09-05 14:06:51,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt. [default2]:[2022-09-05 14:06:51,495] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_52_optim_states.pt [default4]:[2022-09-05 14:06:51,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt. [default4]:[2022-09-05 14:06:51,562] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_55_optim_states.pt [default3]:[2022-09-05 14:06:51,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt. [default3]:[2022-09-05 14:06:51,587] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_38_optim_states.pt [default5]:[2022-09-05 14:06:51,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt. [default5]:[2022-09-05 14:06:51,592] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_39_optim_states.pt [default4]:[2022-09-05 14:06:51,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt. [default4]:[2022-09-05 14:06:51,638] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_27_optim_states.pt [default2]:[2022-09-05 14:06:51,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt. [default2]:[2022-09-05 14:06:51,663] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_54_optim_states.pt [default3]:[2022-09-05 14:06:51,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt. [default3]:[2022-09-05 14:06:51,626] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_52_optim_states.pt [default0]:[2022-09-05 14:06:51,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt. [default0]:[2022-09-05 14:06:51,634] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_52_optim_states.pt [default4]:[2022-09-05 14:06:51,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt. [default4]:[2022-09-05 14:06:51,661] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_39_optim_states.pt [default6]:[2022-09-05 14:06:51,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt. [default6]:[2022-09-05 14:06:51,848] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_71_optim_states.pt [default1]:[2022-09-05 14:06:51,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt. [default1]:[2022-09-05 14:06:51,856] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_26_optim_states.pt [default1]:[2022-09-05 14:06:52,202] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt. [default1]:[2022-09-05 14:06:52,202] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_38_optim_states.pt [default5]:[2022-09-05 14:06:52,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt. [default5]:[2022-09-05 14:06:52,204] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_27_optim_states.pt [default3]:[2022-09-05 14:06:52,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt. [default3]:[2022-09-05 14:06:52,368] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_02_optim_states.pt [default0]:[2022-09-05 14:06:52,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt. [default0]:[2022-09-05 14:06:52,337] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_38_optim_states.pt [default6]:[2022-09-05 14:06:52,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt. [default6]:[2022-09-05 14:06:52,420] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_53_optim_states.pt [default7]:[2022-09-05 14:06:52,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt. [default7]:[2022-09-05 14:06:52,419] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_53_optim_states.pt [default5]:[2022-09-05 14:06:52,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt. [default5]:[2022-09-05 14:06:52,434] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_71_optim_states.pt [default2]:[2022-09-05 14:06:52,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt. [default2]:[2022-09-05 14:06:52,711] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_02_optim_states.pt [default6]:[2022-09-05 14:06:52,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt. [default6]:[2022-09-05 14:06:52,821] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_31_optim_states.pt [default2]:[2022-09-05 14:06:52,926] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt. [default2]:[2022-09-05 14:06:52,926] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_30_optim_states.pt [default3]:[2022-09-05 14:06:52,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt. [default3]:[2022-09-05 14:06:52,875] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_30_optim_states.pt [default4]:[2022-09-05 14:06:52,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt. [default4]:[2022-09-05 14:06:52,943] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_31_optim_states.pt [default5]:[2022-09-05 14:06:53,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt. [default5]:[2022-09-05 14:06:53,065] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_31_optim_states.pt [default3]:[2022-09-05 14:06:53,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt. [default3]:[2022-09-05 14:06:53,231] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_54_optim_states.pt [default1]:[2022-09-05 14:06:53,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt. [default1]:[2022-09-05 14:06:53,404] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_52_optim_states.pt [default5]:[2022-09-05 14:06:53,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt. [default5]:[2022-09-05 14:06:53,364] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_55_optim_states.pt [default1]:[2022-09-05 14:06:53,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt. [default1]:[2022-09-05 14:06:53,515] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_30_optim_states.pt [default5]:[2022-09-05 14:06:53,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt. [default5]:[2022-09-05 14:06:53,824] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_01_optim_states.pt [default6]:[2022-09-05 14:06:53,986] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt. [default6]:[2022-09-05 14:06:53,986] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_55_optim_states.pt [default0]:[2022-09-05 14:06:54,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt. [default0]:[2022-09-05 14:06:54,165] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_30_optim_states.pt [default7]:[2022-09-05 14:06:54,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt. [default7]:[2022-09-05 14:06:54,241] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_55_optim_states.pt [default3]:[2022-09-05 14:06:54,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt. [default3]:[2022-09-05 14:06:54,376] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_12_optim_states.pt [default4]:[2022-09-05 14:06:54,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt. [default4]:[2022-09-05 14:06:54,410] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_01_optim_states.pt [default7]:[2022-09-05 14:06:55,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt. [default7]:[2022-09-05 14:06:55,015] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_01_optim_states.pt [default6]:[2022-09-05 14:06:55,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt. [default6]:[2022-09-05 14:06:55,166] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_01_optim_states.pt [default6]:[2022-09-05 14:06:55,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt. [default6]:[2022-09-05 14:06:55,400] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_13_optim_states.pt [default1]:[2022-09-05 14:06:55,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt. [default1]:[2022-09-05 14:06:55,447] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_02_optim_states.pt [default2]:[2022-09-05 14:06:55,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt. [default2]:[2022-09-05 14:06:55,507] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_12_optim_states.pt [default2]:[2022-09-05 14:06:55,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. [default2]:[2022-09-05 14:06:55,447] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt [default1]:[2022-09-05 14:06:55,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt. [default1]:[2022-09-05 14:06:55,661] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_12_optim_states.pt [default7]:[2022-09-05 14:06:55,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt. [default7]:[2022-09-05 14:06:55,765] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_17_optim_states.pt [default3]:[2022-09-05 14:06:55,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. [default3]:[2022-09-05 14:06:55,805] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt [default6]:[2022-09-05 14:06:55,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt. [default6]:[2022-09-05 14:06:55,871] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_17_optim_states.pt [default3]:[2022-09-05 14:06:55,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt. [default3]:[2022-09-05 14:06:55,869] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_16_optim_states.pt [default2]:[2022-09-05 14:06:55,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt. [default2]:[2022-09-05 14:06:55,936] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_16_optim_states.pt [default0]:[2022-09-05 14:06:55,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt. [default0]:[2022-09-05 14:06:55,998] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_12_optim_states.pt [default5]:[2022-09-05 14:06:56,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt. [default5]:[2022-09-05 14:06:56,046] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_13_optim_states.pt [default7]:[2022-09-05 14:06:56,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt. [default7]:[2022-09-05 14:06:56,039] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_13_optim_states.pt [default4]:[2022-09-05 14:06:56,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt. [default4]:[2022-09-05 14:06:56,074] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_17_optim_states.pt [default0]:[2022-09-05 14:06:56,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt. [default0]:[2022-09-05 14:06:56,159] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_02_optim_states.pt [default1]:[2022-09-05 14:06:56,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt. [default1]:[2022-09-05 14:06:56,344] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_16_optim_states.pt [default5]:[2022-09-05 14:06:56,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt. [default5]:[2022-09-05 14:06:56,457] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_17_optim_states.pt [default0]:[2022-09-05 14:06:56,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt. [default0]:[2022-09-05 14:06:56,530] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_16_optim_states.pt [default0]:[2022-09-05 14:06:56,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [default0]:[2022-09-05 14:06:56,861] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [default4]:[2022-09-05 14:06:57,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt. [default4]:[2022-09-05 14:06:57,007] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_13_optim_states.pt [default1]:[2022-09-05 14:06:57,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. [default1]:[2022-09-05 14:06:57,503] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt [default5]:[2022-09-05 14:06:58,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt. [default5]:[2022-09-05 14:06:58,701] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_1_mp_rank_03_optim_states.pt [default4]:[2022-09-05 14:07:00,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt. [default4]:[2022-09-05 14:07:00,448] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_0_mp_rank_03_optim_states.pt [default6]:[2022-09-05 14:07:01,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt. [default6]:[2022-09-05 14:07:01,439] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_2_mp_rank_03_optim_states.pt [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:time (ms) | save-checkpoint: 29777.71 [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]: successfully saved checkpoint at iteration 1068 to /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]:[Detected kill switch at /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/kill-switch-tr13-176B-mtf. Exiting] datetime: 2022-09-05 14:07:01 [default7]:[2022-09-05 14:07:01,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt. [default7]:[2022-09-05 14:07:01,498] [INFO] [engine.py:3188:_save_zero_checkpoint] bf16_zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step1068/bf16_zero_pp_rank_3_mp_rank_03_optim_states.pt [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default7]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default4]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default2]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default1]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default0]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default6]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default5]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! [default3]:[2022-09-05 14:07:01,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1068 is ready now! WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** [default0]:using world size: 288, data-parallel-size: 4, tensor-model-parallel size: 1, pipeline-model-parallel size: 72 [default0]:accumulate and all-reduce gradients in fp32 for bfloat16 data type. [default0]:using torch.bfloat16 for parameters ... [default0]:------------------------ arguments ------------------------ [default0]: abort_on_unmet_fused_kernel_constraints ......... True [default0]: accumulate_allreduce_grads_in_fp32 .............. True [default0]: adam_beta1 ...................................... 0.9 [default0]: adam_beta2 ...................................... 0.95 [default0]: adam_eps ........................................ 1e-08 [default0]: adlr_autoresume ................................. False [default0]: adlr_autoresume_interval ........................ 1000 [default0]: apply_query_key_layer_scaling ................... True [default0]: apply_residual_connection_post_layernorm ........ False [default0]: attention_dropout ............................... 0.1 [default0]: attention_softmax_in_fp32 ....................... False [default0]: bert_binary_head ................................ True [default0]: bert_load ....................................... None [default0]: bf16 ............................................ True [default0]: bias_dropout_fusion ............................. True [default0]: bias_gelu_fusion ................................ True [default0]: biencoder_projection_dim ........................ 0 [default0]: biencoder_shared_query_context_model ............ False [default0]: block_data_path ................................. None [default0]: checkpoint_activations .......................... True [default0]: checkpoint_in_cpu ............................... False [default0]: checkpoint_num_layers ........................... 1 [default0]: clip_grad ....................................... 1.0 [default0]: codecarbon_dir .................................. None [default0]: consumed_train_samples .......................... 0 [default0]: consumed_train_tokens ........................... 0 [default0]: consumed_valid_samples .......................... 0 [default0]: contigious_checkpointing ........................ False [default0]: cpu_optimizer ................................... False [default0]: cpu_torch_adam .................................. False [default0]: curriculum_learning ............................. False [default0]: data_impl ....................................... mmap [default0]: data_parallel_size .............................. 4 [default0]: data_path ....................................... None [default0]: dataloader_type ................................. single [default0]: DDP_impl ........................................ local [default0]: decoder_seq_length .............................. None [default0]: deepscale ....................................... False [default0]: deepscale_config ................................ None [default0]: deepspeed ....................................... True [default0]: deepspeed_activation_checkpointing .............. True [default0]: deepspeed_config ................................ ./ds_config.1005524.json [default0]: deepspeed_mpi ................................... False [default0]: distribute_checkpointed_activations ............. False [default0]: distributed_backend ............................. nccl [default0]: embed_layernorm ................................. True [default0]: embedding_path .................................. None [default0]: encoder_seq_length .............................. 2048 [default0]: eod_mask_loss ................................... False [default0]: eval_interval ................................... 250 [default0]: eval_iters ...................................... 10 [default0]: eval_only ....................................... True [default0]: evidence_data_path .............................. None [default0]: exit_duration_in_mins ........................... 5990 [default0]: exit_interval ................................... None [default0]: ffn_hidden_size ................................. 57344 [default0]: finetune ........................................ False [default0]: fp16 ............................................ False [default0]: fp16_lm_cross_entropy ........................... False [default0]: fp32_residual_connection ........................ False [default0]: gigaflos_no_embeds .............................. 0 [default0]: global_batch_size ............................... 2048 [default0]: glu_activation .................................. None [default0]: hidden_dropout .................................. 0.1 [default0]: hidden_size ..................................... 14336 [default0]: hysteresis ...................................... 2 [default0]: ict_head_size ................................... None [default0]: ict_load ........................................ None [default0]: img_dim ......................................... 224 [default0]: indexer_batch_size .............................. 128 [default0]: indexer_log_interval ............................ 1000 [default0]: inference ....................................... False [default0]: init_method_std ................................. 0.0048 [default0]: init_method_xavier_uniform ...................... False [default0]: initial_loss_scale .............................. 4294967296 [default0]: kill_switch_path ................................ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/kill-switch-tr13-176B-mtf [default0]: kv_channels ..................................... 128 [default0]: layernorm_epsilon ............................... 1e-05 [default0]: lazy_mpu_init ................................... None [default0]: load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: local_rank ...................................... None [default0]: log_batch_size_to_tensorboard ................... True [default0]: log_interval .................................... 1 [default0]: log_learning_rate_to_tensorboard ................ True [default0]: log_level ....................................... None [default0]: log_level_replica ............................... None [default0]: log_loss_scale_to_tensorboard ................... True [default0]: log_num_zeros_in_grad ........................... False [default0]: log_params_norm ................................. False [default0]: log_path ........................................ None [default0]: log_timers_to_tensorboard ....................... True [default0]: log_validation_ppl_to_tensorboard ............... True [default0]: loss_on_targets_only ............................ False [default0]: loss_scale ...................................... None [default0]: loss_scale_window ............................... 1000 [default0]: lr .............................................. 2e-05 [default0]: lr_decay_iters .................................. None [default0]: lr_decay_samples ................................ None [default0]: lr_decay_style .................................. constant [default0]: lr_decay_tokens ................................. None [default0]: lr_warmup_fraction .............................. None [default0]: lr_warmup_iters ................................. 0 [default0]: lr_warmup_samples ............................... 0 [default0]: make_vocab_size_divisible_by .................... 128 [default0]: mask_prob ....................................... 0.15 [default0]: masked_softmax_fusion ........................... True [default0]: max_position_embeddings ......................... 2048 [default0]: mean_noise_span_length .......................... None [default0]: memory_centric_tiled_linear ..................... False [default0]: merge_file ...................................... None [default0]: micro_batch_size ................................ 1 [default0]: min_loss_scale .................................. 1.0 [default0]: min_lr .......................................... 0.0 [default0]: mmap_warmup ..................................... False [default0]: no_load_optim ................................... True [default0]: no_load_rng ..................................... None [default0]: no_save_optim ................................... None [default0]: no_save_rng ..................................... None [default0]: noise_density ................................... None [default0]: norm_target_loss ................................ True [default0]: num_attention_heads ............................. 112 [default0]: num_channels .................................... 3 [default0]: num_classes ..................................... 1000 [default0]: num_layers ...................................... 70 [default0]: num_layers_per_virtual_pipeline_stage ........... None [default0]: num_workers ..................................... 2 [default0]: onnx_safe ....................................... None [default0]: openai_gelu ..................................... False [default0]: optimizer ....................................... adam [default0]: override_lr_scheduler ........................... False [default0]: pad_vocab_size_to ............................... 250880 [default0]: params_dtype .................................... torch.bfloat16 [default0]: partition_activations ........................... False [default0]: patch_dim ....................................... 16 [default0]: pipeline_model_parallel_size .................... 72 [default0]: position_embedding_type ......................... PositionEmbeddingType.alibi [default0]: pp_partition_method ............................. type:transformer|embedding [default0]: prefixlm ........................................ False [default0]: profile_backward ................................ False [default0]: query_in_block_prob ............................. 0.1 [default0]: rampup_batch_size ............................... None [default0]: rank ............................................ 0 [default0]: remote_device ................................... none [default0]: reset_attention_mask ............................ False [default0]: reset_position_ids .............................. False [default0]: reset_progress .................................. None [default0]: retriever_report_topk_accuracies ................ [] [default0]: retriever_score_scaling ......................... False [default0]: retriever_seq_length ............................ 256 [default0]: reweight_loss_based_on_position_frequency ....... False [default0]: sample_rate ..................................... 1.0 [default0]: save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: save_interval ................................... 5 [default0]: scatter_gather_tensors_in_pipeline .............. True [default0]: scattered_embeddings ............................ False [default0]: seed ............................................ 42 [default0]: seq_length ...................................... 2048 [default0]: sgd_momentum .................................... 0.9 [default0]: short_seq_prob .................................. 0.1 [default0]: skip_train_iteration_range ...................... None [default0]: split ........................................... None [default0]: split_transformers .............................. False [default0]: sync_tp_duplicated_parameters ................... True [default0]: synchronize_each_layer .......................... False [default0]: tensor_model_parallel_size ...................... 1 [default0]: tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/tr13-176B-ml-t0-logs/tensorboard/p31lossseq [default0]: tensorboard_log_interval ........................ 1 [default0]: tensorboard_queue_size .......................... 5 [default0]: test_weighted_split_paths ....................... None [default0]: test_weighted_split_paths_path .................. None [default0]: tile_factor ..................................... 1 [default0]: titles_data_path ................................ None [default0]: tokenizer_name_or_path .......................... bigscience/tokenizer [default0]: tokenizer_type .................................. PretrainedFromHF [default0]: train_iters ..................................... None [default0]: train_samples ................................... 6348800 [default0]: train_tokens .................................... None [default0]: train_weighted_split_names ...................... ['train'] [default0]: train_weighted_split_paths ...................... [['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train']] [default0]: train_weighted_split_paths_path ................. None [default0]: train_weighted_split_splits ..................... [['0:1']] [default0]: train_weighted_split_weights .................... [['1']] [default0]: universal_checkpoint ............................ True [default0]: use_bnb_optimizer ............................... False [default0]: use_checkpoint_lr_scheduler ..................... False [default0]: use_contiguous_buffers_in_ddp ................... True [default0]: use_cpu_initialization .......................... None [default0]: use_one_sent_docs ............................... False [default0]: use_pin_memory .................................. False [default0]: valid_num_workers ............................... 2 [default0]: valid_weighted_split_names ...................... ['validation_pretraining', 'valid'] [default0]: valid_weighted_split_paths ...................... [['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation']] [default0]: valid_weighted_split_paths_path ................. None [default0]: valid_weighted_split_splits ..................... [['0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0'], ['0:1']] [default0]: valid_weighted_split_weights .................... [['0.0330676168743166', '0.011242051312222764', '0.13027200903379185', '0.22171164529099704', '0.10667815627928671', '0.0015595123898173287', '0.13054018439603915', '0.01091803753667153', '0.00011021422347108609', '0.005492381453597748', '0.0004021215011318779', '0.007470068593492175', '0.0006190467776576425', '0.0010335296343329384', '0.0005012010684646179', '0.0006672772956128299', '0.00035928138344705506', '0.0005084433130291778', '0.0021137328219915496', '0.0009129946225980253', '0.0012454301613725426', '0.00031588689199263235', '0.08137213783015229', '0.055293935695898196', '0.04954150576361177', '0.02461641286531197', '0.12091748245519074', '0.0005177025345001541'], ['1']] [default0]: virtual_pipeline_model_parallel_size ............ None [default0]: vocab_extra_ids ................................. 0 [default0]: vocab_file ...................................... None [default0]: weight_decay .................................... 0.0001 [default0]: world_size ...................................... 288 [default0]: zero_allgather_bucket_size ...................... 0.0 [default0]: zero_contigious_gradients ....................... False [default0]: zero_reduce_bucket_size ......................... 0.0 [default0]: zero_reduce_scatter ............................. False [default0]: zero_stage ...................................... 0 [default0]:-------------------- end of arguments --------------------- [default0]:setting number of micro-batches to constant 512 [default0]:> building PretrainedFromHF tokenizer ... [default0]:Offline mode: forcing local_files_only=True [default0]: vocab file is un-used. loading tokenizer from pre-trained model [default0]:Offline mode: forcing local_files_only=True [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer.json from cache at /gpfswork/rech/six/commun/models/29d0a41f4527257b8afe6d5495f492dac260318430f18239a42ca5f6dc4487fc.7b0fb8edc2986944ff9b7418149b52d8c4a1354a17d0360deb8974da70c6cc03 [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/added_tokens.json from cache at None [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/special_tokens_map.json from cache at /gpfswork/rech/six/commun/models/4f03e43bcc54e0721823e6a06b1d197905e2ea79aa7dcc1a0f0fcecc73ce3fb2.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer_config.json from cache at /gpfswork/rech/six/commun/models/9441c67b923ef7a65950a64e31c40f80ed181ba59502981a80f2cd0c438c6432.3c09887250243e50d8de9d10b2a778152434f62a22a95b5f89dbbe79a6eb496a [default0]: > padded vocab (size: 250680) with 200 dummy tokens (new size: 250880) [default0]:DeepSpeed general environment info: [default0]:torch install path ............... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch'] [default0]:torch version .................... 1.12.0 [default0]:torch cuda version ............... 11.3 [default0]:torch hip version ................ None [default0]:nvcc version ..................... 11.4 [default0]:deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed'] [default0]:deepspeed info ................... 0.7.1+8b2a6371, 8b2a6371, master [default0]:deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3 [default7]:> setting tensorboard ... [default0]:**** Git info for Megatron: git_hash=6c1018f git_branch=mtf-multival **** [default0]:> initializing torch distributed ... [default0]:[2022-09-05 14:15:44,428] [INFO] [comm.py:628:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [default0]:> initializing tensor model parallel with size 1 [default0]:> initializing pipeline model parallel with size 72 [default0]:> setting random seeds to 42 ... [default0]:[2022-09-05 14:16:02,681] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 [default0]:> compiling dataset index builder ... [default0]:make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:make: Nothing to be done for 'default'. [default0]:make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:>>> done with dataset index builder. Compilation time: 0.097 seconds [default0]:> compiling and loading fused kernels ... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module fused_mix_prec_layer_norm_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module fused_mix_prec_layer_norm_cuda... [default0]:>>> done with compiling and loading fused kernels. Compilation time: 6.986 seconds [default0]:[Detected kill switch at /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/kill-switch-tr13-176B-mtf. Exiting] datetime: 2022-09-05 14:16:09 WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** [default7]:> setting tensorboard ... [default0]:Offline mode: forcing local_files_only=True [default0]:Offline mode: forcing local_files_only=True [default0]:using world size: 288, data-parallel-size: 4, tensor-model-parallel size: 1, pipeline-model-parallel size: 72 [default0]:accumulate and all-reduce gradients in fp32 for bfloat16 data type. [default0]:using torch.bfloat16 for parameters ... [default0]:------------------------ arguments ------------------------ [default0]: abort_on_unmet_fused_kernel_constraints ......... True [default0]: accumulate_allreduce_grads_in_fp32 .............. True [default0]: adam_beta1 ...................................... 0.9 [default0]: adam_beta2 ...................................... 0.95 [default0]: adam_eps ........................................ 1e-08 [default0]: adlr_autoresume ................................. False [default0]: adlr_autoresume_interval ........................ 1000 [default0]: apply_query_key_layer_scaling ................... True [default0]: apply_residual_connection_post_layernorm ........ False [default0]: attention_dropout ............................... 0.1 [default0]: attention_softmax_in_fp32 ....................... False [default0]: bert_binary_head ................................ True [default0]: bert_load ....................................... None [default0]: bf16 ............................................ True [default0]: bias_dropout_fusion ............................. True [default0]: bias_gelu_fusion ................................ True [default0]: biencoder_projection_dim ........................ 0 [default0]: biencoder_shared_query_context_model ............ False [default0]: block_data_path ................................. None [default0]: checkpoint_activations .......................... True [default0]: checkpoint_in_cpu ............................... False [default0]: checkpoint_num_layers ........................... 1 [default0]: clip_grad ....................................... 1.0 [default0]: codecarbon_dir .................................. None [default0]: consumed_train_samples .......................... 0 [default0]: consumed_train_tokens ........................... 0 [default0]: consumed_valid_samples .......................... 0 [default0]: contigious_checkpointing ........................ False [default0]: cpu_optimizer ................................... False [default0]: cpu_torch_adam .................................. False [default0]: curriculum_learning ............................. False [default0]: data_impl ....................................... mmap [default0]: data_parallel_size .............................. 4 [default0]: data_path ....................................... None [default0]: dataloader_type ................................. single [default0]: DDP_impl ........................................ local [default0]: decoder_seq_length .............................. None [default0]: deepscale ....................................... False [default0]: deepscale_config ................................ None [default0]: deepspeed ....................................... True [default0]: deepspeed_activation_checkpointing .............. True [default0]: deepspeed_config ................................ ./ds_config.1006044.json [default0]: deepspeed_mpi ................................... False [default0]: distribute_checkpointed_activations ............. False [default0]: distributed_backend ............................. nccl [default0]: embed_layernorm ................................. True [default0]: embedding_path .................................. None [default0]: encoder_seq_length .............................. 2048 [default0]: eod_mask_loss ................................... False [default0]: eval_interval ................................... 250 [default0]: eval_iters ...................................... 10 [default0]: eval_only ....................................... True [default0]: evidence_data_path .............................. None [default0]: exit_duration_in_mins ........................... 5990 [default0]: exit_interval ................................... None [default0]: ffn_hidden_size ................................. 57344 [default0]: finetune ........................................ False [default0]: fp16 ............................................ False [default0]: fp16_lm_cross_entropy ........................... False [default0]: fp32_residual_connection ........................ False [default0]: gigaflos_no_embeds .............................. 0 [default0]: global_batch_size ............................... 2048 [default0]: glu_activation .................................. None [default0]: hidden_dropout .................................. 0.1 [default0]: hidden_size ..................................... 14336 [default0]: hysteresis ...................................... 2 [default0]: ict_head_size ................................... None [default0]: ict_load ........................................ None [default0]: img_dim ......................................... 224 [default0]: indexer_batch_size .............................. 128 [default0]: indexer_log_interval ............................ 1000 [default0]: inference ....................................... False [default0]: init_method_std ................................. 0.0048 [default0]: init_method_xavier_uniform ...................... False [default0]: initial_loss_scale .............................. 4294967296 [default0]: kill_switch_path ................................ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/kill-switch-tr13-176B-mtf [default0]: kv_channels ..................................... 128 [default0]: layernorm_epsilon ............................... 1e-05 [default0]: lazy_mpu_init ................................... None [default0]: load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: local_rank ...................................... None [default0]: log_batch_size_to_tensorboard ................... True [default0]: log_interval .................................... 1 [default0]: log_learning_rate_to_tensorboard ................ True [default0]: log_level ....................................... None [default0]: log_level_replica ............................... None [default0]: log_loss_scale_to_tensorboard ................... True [default0]: log_num_zeros_in_grad ........................... False [default0]: log_params_norm ................................. False [default0]: log_path ........................................ None [default0]: log_timers_to_tensorboard ....................... True [default0]: log_validation_ppl_to_tensorboard ............... True [default0]: loss_on_targets_only ............................ False [default0]: loss_scale ...................................... None [default0]: loss_scale_window ............................... 1000 [default0]: lr .............................................. 2e-05 [default0]: lr_decay_iters .................................. None [default0]: lr_decay_samples ................................ None [default0]: lr_decay_style .................................. constant [default0]: lr_decay_tokens ................................. None [default0]: lr_warmup_fraction .............................. None [default0]: lr_warmup_iters ................................. 0 [default0]: lr_warmup_samples ............................... 0 [default0]: make_vocab_size_divisible_by .................... 128 [default0]: mask_prob ....................................... 0.15 [default0]: masked_softmax_fusion ........................... True [default0]: max_position_embeddings ......................... 2048 [default0]: mean_noise_span_length .......................... None [default0]: memory_centric_tiled_linear ..................... False [default0]: merge_file ...................................... None [default0]: micro_batch_size ................................ 1 [default0]: min_loss_scale .................................. 1.0 [default0]: min_lr .......................................... 0.0 [default0]: mmap_warmup ..................................... False [default0]: no_load_optim ................................... True [default0]: no_load_rng ..................................... None [default0]: no_save_optim ................................... None [default0]: no_save_rng ..................................... None [default0]: noise_density ................................... None [default0]: norm_target_loss ................................ True [default0]: num_attention_heads ............................. 112 [default0]: num_channels .................................... 3 [default0]: num_classes ..................................... 1000 [default0]: num_layers ...................................... 70 [default0]: num_layers_per_virtual_pipeline_stage ........... None [default0]: num_workers ..................................... 2 [default0]: onnx_safe ....................................... None [default0]: openai_gelu ..................................... False [default0]: optimizer ....................................... adam [default0]: override_lr_scheduler ........................... False [default0]: pad_vocab_size_to ............................... 250880 [default0]: params_dtype .................................... torch.bfloat16 [default0]: partition_activations ........................... False [default0]: patch_dim ....................................... 16 [default0]: pipeline_model_parallel_size .................... 72 [default0]: position_embedding_type ......................... PositionEmbeddingType.alibi [default0]: pp_partition_method ............................. type:transformer|embedding [default0]: prefixlm ........................................ False [default0]: profile_backward ................................ False [default0]: query_in_block_prob ............................. 0.1 [default0]: rampup_batch_size ............................... None [default0]: rank ............................................ 0 [default0]: remote_device ................................... none [default0]: reset_attention_mask ............................ False [default0]: reset_position_ids .............................. False [default0]: reset_progress .................................. None [default0]: retriever_report_topk_accuracies ................ [] [default0]: retriever_score_scaling ......................... False [default0]: retriever_seq_length ............................ 256 [default0]: reweight_loss_based_on_position_frequency ....... False [default0]: sample_rate ..................................... 1.0 [default0]: save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: save_interval ................................... 5 [default0]: scatter_gather_tensors_in_pipeline .............. True [default0]: scattered_embeddings ............................ False [default0]: seed ............................................ 42 [default0]: seq_length ...................................... 2048 [default0]: sgd_momentum .................................... 0.9 [default0]: short_seq_prob .................................. 0.1 [default0]: skip_train_iteration_range ...................... None [default0]: split ........................................... None [default0]: split_transformers .............................. False [default0]: sync_tp_duplicated_parameters ................... True [default0]: synchronize_each_layer .......................... False [default0]: tensor_model_parallel_size ...................... 1 [default0]: tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/tr13-176B-ml-t0-logs/tensorboard/p31lossseq [default0]: tensorboard_log_interval ........................ 1 [default0]: tensorboard_queue_size .......................... 5 [default0]: test_weighted_split_paths ....................... None [default0]: test_weighted_split_paths_path .................. None [default0]: tile_factor ..................................... 1 [default0]: titles_data_path ................................ None [default0]: tokenizer_name_or_path .......................... bigscience/tokenizer [default0]: tokenizer_type .................................. PretrainedFromHF [default0]: train_iters ..................................... None [default0]: train_samples ................................... 6348800 [default0]: train_tokens .................................... None [default0]: train_weighted_split_names ...................... ['train'] [default0]: train_weighted_split_paths ...................... [['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train']] [default0]: train_weighted_split_paths_path ................. None [default0]: train_weighted_split_splits ..................... [['0:1']] [default0]: train_weighted_split_weights .................... [['1']] [default0]: universal_checkpoint ............................ True [default0]: use_bnb_optimizer ............................... False [default0]: use_checkpoint_lr_scheduler ..................... False [default0]: use_contiguous_buffers_in_ddp ................... True [default0]: use_cpu_initialization .......................... None [default0]: use_one_sent_docs ............................... False [default0]: use_pin_memory .................................. False [default0]: valid_num_workers ............................... 2 [default0]: valid_weighted_split_names ...................... ['validation_pretraining', 'valid'] [default0]: valid_weighted_split_paths ...................... [['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation']] [default0]: valid_weighted_split_paths_path ................. None [default0]: valid_weighted_split_splits ..................... [['0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0'], ['0:1']] [default0]: valid_weighted_split_weights .................... [['0.0330676168743166', '0.011242051312222764', '0.13027200903379185', '0.22171164529099704', '0.10667815627928671', '0.0015595123898173287', '0.13054018439603915', '0.01091803753667153', '0.00011021422347108609', '0.005492381453597748', '0.0004021215011318779', '0.007470068593492175', '0.0006190467776576425', '0.0010335296343329384', '0.0005012010684646179', '0.0006672772956128299', '0.00035928138344705506', '0.0005084433130291778', '0.0021137328219915496', '0.0009129946225980253', '0.0012454301613725426', '0.00031588689199263235', '0.08137213783015229', '0.055293935695898196', '0.04954150576361177', '0.02461641286531197', '0.12091748245519074', '0.0005177025345001541'], ['1']] [default0]: virtual_pipeline_model_parallel_size ............ None [default0]: vocab_extra_ids ................................. 0 [default0]: vocab_file ...................................... None [default0]: weight_decay .................................... 0.0001 [default0]: world_size ...................................... 288 [default0]: zero_allgather_bucket_size ...................... 0.0 [default0]: zero_contigious_gradients ....................... False [default0]: zero_reduce_bucket_size ......................... 0.0 [default0]: zero_reduce_scatter ............................. False [default0]: zero_stage ...................................... 0 [default0]:-------------------- end of arguments --------------------- [default0]:setting number of micro-batches to constant 512 [default0]:> building PretrainedFromHF tokenizer ... [default0]: vocab file is un-used. loading tokenizer from pre-trained model [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer.json from cache at /gpfswork/rech/six/commun/models/29d0a41f4527257b8afe6d5495f492dac260318430f18239a42ca5f6dc4487fc.7b0fb8edc2986944ff9b7418149b52d8c4a1354a17d0360deb8974da70c6cc03 [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/added_tokens.json from cache at None [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/special_tokens_map.json from cache at /gpfswork/rech/six/commun/models/4f03e43bcc54e0721823e6a06b1d197905e2ea79aa7dcc1a0f0fcecc73ce3fb2.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer_config.json from cache at /gpfswork/rech/six/commun/models/9441c67b923ef7a65950a64e31c40f80ed181ba59502981a80f2cd0c438c6432.3c09887250243e50d8de9d10b2a778152434f62a22a95b5f89dbbe79a6eb496a [default0]: > padded vocab (size: 250680) with 200 dummy tokens (new size: 250880) [default0]:DeepSpeed general environment info: [default0]:torch install path ............... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch'] [default0]:torch version .................... 1.12.0 [default0]:torch cuda version ............... 11.3 [default0]:torch hip version ................ None [default0]:nvcc version ..................... 11.4 [default0]:deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed'] [default0]:deepspeed info ................... 0.7.1+8b2a6371, 8b2a6371, master [default0]:deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3 [default0]:**** Git info for Megatron: git_hash=6c1018f git_branch=mtf-multival **** [default0]:> initializing torch distributed ... [default0]:[2022-09-05 14:18:05,459] [INFO] [comm.py:628:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [default0]:> initializing tensor model parallel with size 1 [default0]:> initializing pipeline model parallel with size 72 [default0]:> setting random seeds to 42 ... [default0]:[2022-09-05 14:18:13,763] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 [default0]:> compiling dataset index builder ... [default0]:make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:make: Nothing to be done for 'default'. [default0]:make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:>>> done with dataset index builder. Compilation time: 0.087 seconds [default0]:> compiling and loading fused kernels ... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module fused_mix_prec_layer_norm_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module fused_mix_prec_layer_norm_cuda... [default0]:>>> done with compiling and loading fused kernels. Compilation time: 7.186 seconds [default0]:time to initialize megatron (seconds): 13.037 [default0]:[after megatron is initialized] datetime: 2022-09-05 14:18:21 [default0]:building GPT model ... [default0]:[2022-09-05 14:18:21,078] [INFO] [utils.py:827:see_memory_usage] Before Building Model [default0]:[2022-09-05 14:18:21,078] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [default0]:[2022-09-05 14:18:21,078] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.08 GB, percent = 7.2% [default0]:SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None [default0]:Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=1, model=0): 5, ProcessCoord(pipe=1, data=2, model=0): 6, ProcessCoord(pipe=1, data=3, model=0): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=1, model=0): 9, ProcessCoord(pipe=2, data=2, model=0): 10, ProcessCoord(pipe=2, data=3, model=0): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=1, model=0): 13, ProcessCoord(pipe=3, data=2, model=0): 14, ProcessCoord(pipe=3, data=3, model=0): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=1, model=0): 17, ProcessCoord(pipe=4, data=2, model=0): 18, ProcessCoord(pipe=4, data=3, model=0): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=1, model=0): 21, ProcessCoord(pipe=5, data=2, model=0): 22, ProcessCoord(pipe=5, data=3, model=0): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=1, model=0): 25, ProcessCoord(pipe=6, data=2, model=0): 26, ProcessCoord(pipe=6, data=3, model=0): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=1, model=0): 29, ProcessCoord(pipe=7, data=2, model=0): 30, ProcessCoord(pipe=7, data=3, model=0): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=1, model=0): 33, ProcessCoord(pipe=8, data=2, model=0): 34, ProcessCoord(pipe=8, data=3, model=0): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=1, model=0): 37, ProcessCoord(pipe=9, data=2, model=0): 38, ProcessCoord(pipe=9, data=3, model=0): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=1, model=0): 41, ProcessCoord(pipe=10, data=2, model=0): 42, ProcessCoord(pipe=10, data=3, model=0): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=1, model=0): 45, ProcessCoord(pipe=11, data=2, model=0): 46, ProcessCoord(pipe=11, data=3, model=0): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=1, model=0): 49, ProcessCoord(pipe=12, data=2, model=0): 50, ProcessCoord(pipe=12, data=3, model=0): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=1, model=0): 53, ProcessCoord(pipe=13, data=2, model=0): 54, ProcessCoord(pipe=13, data=3, model=0): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=1, model=0): 57, ProcessCoord(pipe=14, data=2, model=0): 58, ProcessCoord(pipe=14, data=3, model=0): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=1, model=0): 61, ProcessCoord(pipe=15, data=2, model=0): 62, ProcessCoord(pipe=15, data=3, model=0): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=1, model=0): 65, ProcessCoord(pipe=16, data=2, model=0): 66, ProcessCoord(pipe=16, data=3, model=0): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=1, model=0): 69, ProcessCoord(pipe=17, data=2, model=0): 70, ProcessCoord(pipe=17, data=3, model=0): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=1, model=0): 73, ProcessCoord(pipe=18, data=2, model=0): 74, ProcessCoord(pipe=18, data=3, model=0): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=1, model=0): 77, ProcessCoord(pipe=19, data=2, model=0): 78, ProcessCoord(pipe=19, data=3, model=0): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=1, model=0): 81, ProcessCoord(pipe=20, data=2, model=0): 82, ProcessCoord(pipe=20, data=3, model=0): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=1, model=0): 85, ProcessCoord(pipe=21, data=2, model=0): 86, ProcessCoord(pipe=21, data=3, model=0): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=1, model=0): 89, ProcessCoord(pipe=22, data=2, model=0): 90, ProcessCoord(pipe=22, data=3, model=0): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=1, model=0): 93, ProcessCoord(pipe=23, data=2, model=0): 94, ProcessCoord(pipe=23, data=3, model=0): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=1, model=0): 97, ProcessCoord(pipe=24, data=2, model=0): 98, ProcessCoord(pipe=24, data=3, model=0): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=1, model=0): 101, ProcessCoord(pipe=25, data=2, model=0): 102, ProcessCoord(pipe=25, data=3, model=0): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=1, model=0): 105, ProcessCoord(pipe=26, data=2, model=0): 106, ProcessCoord(pipe=26, data=3, model=0): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=1, model=0): 109, ProcessCoord(pipe=27, data=2, model=0): 110, ProcessCoord(pipe=27, data=3, model=0): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=1, model=0): 113, ProcessCoord(pipe=28, data=2, model=0): 114, ProcessCoord(pipe=28, data=3, model=0): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=1, model=0): 117, ProcessCoord(pipe=29, data=2, model=0): 118, ProcessCoord(pipe=29, data=3, model=0): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=1, model=0): 121, ProcessCoord(pipe=30, data=2, model=0): 122, ProcessCoord(pipe=30, data=3, model=0): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=1, model=0): 125, ProcessCoord(pipe=31, data=2, model=0): 126, ProcessCoord(pipe=31, data=3, model=0): 127, ProcessCoord(pipe=32, data=0, model=0): 128, ProcessCoord(pipe=32, data=1, model=0): 129, ProcessCoord(pipe=32, data=2, model=0): 130, ProcessCoord(pipe=32, data=3, model=0): 131, ProcessCoord(pipe=33, data=0, model=0): 132, ProcessCoord(pipe=33, data=1, model=0): 133, ProcessCoord(pipe=33, data=2, model=0): 134, ProcessCoord(pipe=33, data=3, model=0): 135, ProcessCoord(pipe=34, data=0, model=0): 136, ProcessCoord(pipe=34, data=1, model=0): 137, ProcessCoord(pipe=34, data=2, model=0): 138, ProcessCoord(pipe=34, data=3, model=0): 139, ProcessCoord(pipe=35, data=0, model=0): 140, ProcessCoord(pipe=35, data=1, model=0): 141, ProcessCoord(pipe=35, data=2, model=0): 142, ProcessCoord(pipe=35, data=3, model=0): 143, ProcessCoord(pipe=36, data=0, model=0): 144, ProcessCoord(pipe=36, data=1, model=0): 145, ProcessCoord(pipe=36, data=2, model=0): 146, ProcessCoord(pipe=36, data=3, model=0): 147, ProcessCoord(pipe=37, data=0, model=0): 148, ProcessCoord(pipe=37, data=1, model=0): 149, ProcessCoord(pipe=37, data=2, model=0): 150, ProcessCoord(pipe=37, data=3, model=0): 151, ProcessCoord(pipe=38, data=0, model=0): 152, ProcessCoord(pipe=38, data=1, model=0): 153, ProcessCoord(pipe=38, data=2, model=0): 154, ProcessCoord(pipe=38, data=3, model=0): 155, ProcessCoord(pipe=39, data=0, model=0): 156, ProcessCoord(pipe=39, data=1, model=0): 157, ProcessCoord(pipe=39, data=2, model=0): 158, ProcessCoord(pipe=39, data=3, model=0): 159, ProcessCoord(pipe=40, data=0, model=0): 160, ProcessCoord(pipe=40, data=1, model=0): 161, ProcessCoord(pipe=40, data=2, model=0): 162, ProcessCoord(pipe=40, data=3, model=0): 163, ProcessCoord(pipe=41, data=0, model=0): 164, ProcessCoord(pipe=41, data=1, model=0): 165, ProcessCoord(pipe=41, data=2, model=0): 166, ProcessCoord(pipe=41, data=3, model=0): 167, ProcessCoord(pipe=42, data=0, model=0): 168, ProcessCoord(pipe=42, data=1, model=0): 169, ProcessCoord(pipe=42, data=2, model=0): 170, ProcessCoord(pipe=42, data=3, model=0): 171, ProcessCoord(pipe=43, data=0, model=0): 172, ProcessCoord(pipe=43, data=1, model=0): 173, ProcessCoord(pipe=43, data=2, model=0): 174, ProcessCoord(pipe=43, data=3, model=0): 175, ProcessCoord(pipe=44, data=0, model=0): 176, ProcessCoord(pipe=44, data=1, model=0): 177, ProcessCoord(pipe=44, data=2, model=0): 178, ProcessCoord(pipe=44, data=3, model=0): 179, ProcessCoord(pipe=45, data=0, model=0): 180, ProcessCoord(pipe=45, data=1, model=0): 181, ProcessCoord(pipe=45, data=2, model=0): 182, ProcessCoord(pipe=45, data=3, model=0): 183, ProcessCoord(pipe=46, data=0, model=0): 184, ProcessCoord(pipe=46, data=1, model=0): 185, ProcessCoord(pipe=46, data=2, model=0): 186, ProcessCoord(pipe=46, data=3, model=0): 187, ProcessCoord(pipe=47, data=0, model=0): 188, ProcessCoord(pipe=47, data=1, model=0): 189, ProcessCoord(pipe=47, data=2, model=0): 190, ProcessCoord(pipe=47, data=3, model=0): 191, ProcessCoord(pipe=48, data=0, model=0): 192, ProcessCoord(pipe=48, data=1, model=0): 193, ProcessCoord(pipe=48, data=2, model=0): 194, ProcessCoord(pipe=48, data=3, model=0): 195, ProcessCoord(pipe=49, data=0, model=0): 196, ProcessCoord(pipe=49, data=1, model=0): 197, ProcessCoord(pipe=49, data=2, model=0): 198, ProcessCoord(pipe=49, data=3, model=0): 199, ProcessCoord(pipe=50, data=0, model=0): 200, ProcessCoord(pipe=50, data=1, model=0): 201, ProcessCoord(pipe=50, data=2, model=0): 202, ProcessCoord(pipe=50, data=3, model=0): 203, ProcessCoord(pipe=51, data=0, model=0): 204, ProcessCoord(pipe=51, data=1, model=0): 205, ProcessCoord(pipe=51, data=2, model=0): 206, ProcessCoord(pipe=51, data=3, model=0): 207, ProcessCoord(pipe=52, data=0, model=0): 208, ProcessCoord(pipe=52, data=1, model=0): 209, ProcessCoord(pipe=52, data=2, model=0): 210, ProcessCoord(pipe=52, data=3, model=0): 211, ProcessCoord(pipe=53, data=0, model=0): 212, ProcessCoord(pipe=53, data=1, model=0): 213, ProcessCoord(pipe=53, data=2, model=0): 214, ProcessCoord(pipe=53, data=3, model=0): 215, ProcessCoord(pipe=54, data=0, model=0): 216, ProcessCoord(pipe=54, data=1, model=0): 217, ProcessCoord(pipe=54, data=2, model=0): 218, ProcessCoord(pipe=54, data=3, model=0): 219, ProcessCoord(pipe=55, data=0, model=0): 220, ProcessCoord(pipe=55, data=1, model=0): 221, ProcessCoord(pipe=55, data=2, model=0): 222, ProcessCoord(pipe=55, data=3, model=0): 223, ProcessCoord(pipe=56, data=0, model=0): 224, ProcessCoord(pipe=56, data=1, model=0): 225, ProcessCoord(pipe=56, data=2, model=0): 226, ProcessCoord(pipe=56, data=3, model=0): 227, ProcessCoord(pipe=57, data=0, model=0): 228, ProcessCoord(pipe=57, data=1, model=0): 229, ProcessCoord(pipe=57, data=2, model=0): 230, ProcessCoord(pipe=57, data=3, model=0): 231, ProcessCoord(pipe=58, data=0, model=0): 232, ProcessCoord(pipe=58, data=1, model=0): 233, ProcessCoord(pipe=58, data=2, model=0): 234, ProcessCoord(pipe=58, data=3, model=0): 235, ProcessCoord(pipe=59, data=0, model=0): 236, ProcessCoord(pipe=59, data=1, model=0): 237, ProcessCoord(pipe=59, data=2, model=0): 238, ProcessCoord(pipe=59, data=3, model=0): 239, ProcessCoord(pipe=60, data=0, model=0): 240, ProcessCoord(pipe=60, data=1, model=0): 241, ProcessCoord(pipe=60, data=2, model=0): 242, ProcessCoord(pipe=60, data=3, model=0): 243, ProcessCoord(pipe=61, data=0, model=0): 244, ProcessCoord(pipe=61, data=1, model=0): 245, ProcessCoord(pipe=61, data=2, model=0): 246, ProcessCoord(pipe=61, data=3, model=0): 247, ProcessCoord(pipe=62, data=0, model=0): 248, ProcessCoord(pipe=62, data=1, model=0): 249, ProcessCoord(pipe=62, data=2, model=0): 250, ProcessCoord(pipe=62, data=3, model=0): 251, ProcessCoord(pipe=63, data=0, model=0): 252, ProcessCoord(pipe=63, data=1, model=0): 253, ProcessCoord(pipe=63, data=2, model=0): 254, ProcessCoord(pipe=63, data=3, model=0): 255, ProcessCoord(pipe=64, data=0, model=0): 256, ProcessCoord(pipe=64, data=1, model=0): 257, ProcessCoord(pipe=64, data=2, model=0): 258, ProcessCoord(pipe=64, data=3, model=0): 259, ProcessCoord(pipe=65, data=0, model=0): 260, ProcessCoord(pipe=65, data=1, model=0): 261, ProcessCoord(pipe=65, data=2, model=0): 262, ProcessCoord(pipe=65, data=3, model=0): 263, ProcessCoord(pipe=66, data=0, model=0): 264, ProcessCoord(pipe=66, data=1, model=0): 265, ProcessCoord(pipe=66, data=2, model=0): 266, ProcessCoord(pipe=66, data=3, model=0): 267, ProcessCoord(pipe=67, data=0, model=0): 268, ProcessCoord(pipe=67, data=1, model=0): 269, ProcessCoord(pipe=67, data=2, model=0): 270, ProcessCoord(pipe=67, data=3, model=0): 271, ProcessCoord(pipe=68, data=0, model=0): 272, ProcessCoord(pipe=68, data=1, model=0): 273, ProcessCoord(pipe=68, data=2, model=0): 274, ProcessCoord(pipe=68, data=3, model=0): 275, ProcessCoord(pipe=69, data=0, model=0): 276, ProcessCoord(pipe=69, data=1, model=0): 277, ProcessCoord(pipe=69, data=2, model=0): 278, ProcessCoord(pipe=69, data=3, model=0): 279, ProcessCoord(pipe=70, data=0, model=0): 280, ProcessCoord(pipe=70, data=1, model=0): 281, ProcessCoord(pipe=70, data=2, model=0): 282, ProcessCoord(pipe=70, data=3, model=0): 283, ProcessCoord(pipe=71, data=0, model=0): 284, ProcessCoord(pipe=71, data=1, model=0): 285, ProcessCoord(pipe=71, data=2, model=0): 286, ProcessCoord(pipe=71, data=3, model=0): 287} [default0]:[2022-09-05 14:18:24,984] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer|embedding [default0]:stage=0 layers=3 [default0]: 0: _to_float16 [default0]: 1: EmbeddingPipe [default0]: 2: [default0]:stage=1 layers=1 [default0]: 3: ParallelTransformerLayerPipe [default0]:stage=2 layers=1 [default0]: 4: ParallelTransformerLayerPipe [default0]:stage=3 layers=1 [default0]: 5: ParallelTransformerLayerPipe [default0]:stage=4 layers=1 [default0]: 6: ParallelTransformerLayerPipe [default0]:stage=5 layers=1 [default0]: 7: ParallelTransformerLayerPipe [default0]:stage=6 layers=1 [default0]: 8: ParallelTransformerLayerPipe [default0]:stage=7 layers=1 [default0]: 9: ParallelTransformerLayerPipe [default0]:stage=8 layers=1 [default0]: 10: ParallelTransformerLayerPipe [default0]:stage=9 layers=1 [default0]: 11: ParallelTransformerLayerPipe [default0]:stage=10 layers=1 [default0]: 12: ParallelTransformerLayerPipe [default0]:stage=11 layers=1 [default0]: 13: ParallelTransformerLayerPipe [default0]:stage=12 layers=1 [default0]: 14: ParallelTransformerLayerPipe [default0]:stage=13 layers=1 [default0]: 15: ParallelTransformerLayerPipe [default0]:stage=14 layers=1 [default0]: 16: ParallelTransformerLayerPipe [default0]:stage=15 layers=1 [default0]: 17: ParallelTransformerLayerPipe [default0]:stage=16 layers=1 [default0]: 18: ParallelTransformerLayerPipe [default0]:stage=17 layers=1 [default0]: 19: ParallelTransformerLayerPipe [default0]:stage=18 layers=1 [default0]: 20: ParallelTransformerLayerPipe [default0]:stage=19 layers=1 [default0]: 21: ParallelTransformerLayerPipe [default0]:stage=20 layers=1 [default0]: 22: ParallelTransformerLayerPipe [default0]:stage=21 layers=1 [default0]: 23: ParallelTransformerLayerPipe [default0]:stage=22 layers=1 [default0]: 24: ParallelTransformerLayerPipe [default0]:stage=23 layers=1 [default0]: 25: ParallelTransformerLayerPipe [default0]:stage=24 layers=1 [default0]: 26: ParallelTransformerLayerPipe [default0]:stage=25 layers=1 [default0]: 27: ParallelTransformerLayerPipe [default0]:stage=26 layers=1 [default0]: 28: ParallelTransformerLayerPipe [default0]:stage=27 layers=1 [default0]: 29: ParallelTransformerLayerPipe [default0]:stage=28 layers=1 [default0]: 30: ParallelTransformerLayerPipe [default0]:stage=29 layers=1 [default0]: 31: ParallelTransformerLayerPipe [default0]:stage=30 layers=1 [default0]: 32: ParallelTransformerLayerPipe [default0]:stage=31 layers=1 [default0]: 33: ParallelTransformerLayerPipe [default0]:stage=32 layers=1 [default0]: 34: ParallelTransformerLayerPipe [default0]:stage=33 layers=1 [default0]: 35: ParallelTransformerLayerPipe [default0]:stage=34 layers=1 [default0]: 36: ParallelTransformerLayerPipe [default0]:stage=35 layers=1 [default0]: 37: ParallelTransformerLayerPipe [default0]:stage=36 layers=1 [default0]: 38: ParallelTransformerLayerPipe [default0]:stage=37 layers=1 [default0]: 39: ParallelTransformerLayerPipe [default0]:stage=38 layers=1 [default0]: 40: ParallelTransformerLayerPipe [default0]:stage=39 layers=1 [default0]: 41: ParallelTransformerLayerPipe [default0]:stage=40 layers=1 [default0]: 42: ParallelTransformerLayerPipe [default0]:stage=41 layers=1 [default0]: 43: ParallelTransformerLayerPipe [default0]:stage=42 layers=1 [default0]: 44: ParallelTransformerLayerPipe [default0]:stage=43 layers=1 [default0]: 45: ParallelTransformerLayerPipe [default0]:stage=44 layers=1 [default0]: 46: ParallelTransformerLayerPipe [default0]:stage=45 layers=1 [default0]: 47: ParallelTransformerLayerPipe [default0]:stage=46 layers=1 [default0]: 48: ParallelTransformerLayerPipe [default0]:stage=47 layers=1 [default0]: 49: ParallelTransformerLayerPipe [default0]:stage=48 layers=1 [default0]: 50: ParallelTransformerLayerPipe [default0]:stage=49 layers=1 [default0]: 51: ParallelTransformerLayerPipe [default0]:stage=50 layers=1 [default0]: 52: ParallelTransformerLayerPipe [default0]:stage=51 layers=1 [default0]: 53: ParallelTransformerLayerPipe [default0]:stage=52 layers=1 [default0]: 54: ParallelTransformerLayerPipe [default0]:stage=53 layers=1 [default0]: 55: ParallelTransformerLayerPipe [default0]:stage=54 layers=1 [default0]: 56: ParallelTransformerLayerPipe [default0]:stage=55 layers=1 [default0]: 57: ParallelTransformerLayerPipe [default0]:stage=56 layers=1 [default0]: 58: ParallelTransformerLayerPipe [default0]:stage=57 layers=1 [default0]: 59: ParallelTransformerLayerPipe [default0]:stage=58 layers=1 [default0]: 60: ParallelTransformerLayerPipe [default0]:stage=59 layers=1 [default0]: 61: ParallelTransformerLayerPipe [default0]:stage=60 layers=1 [default0]: 62: ParallelTransformerLayerPipe [default0]:stage=61 layers=1 [default0]: 63: ParallelTransformerLayerPipe [default0]:stage=62 layers=1 [default0]: 64: ParallelTransformerLayerPipe [default0]:stage=63 layers=1 [default0]: 65: ParallelTransformerLayerPipe [default0]:stage=64 layers=1 [default0]: 66: ParallelTransformerLayerPipe [default0]:stage=65 layers=1 [default0]: 67: ParallelTransformerLayerPipe [default0]:stage=66 layers=1 [default0]: 68: ParallelTransformerLayerPipe [default0]:stage=67 layers=1 [default0]: 69: ParallelTransformerLayerPipe [default0]:stage=68 layers=1 [default0]: 70: ParallelTransformerLayerPipe [default0]:stage=69 layers=1 [default0]: 71: ParallelTransformerLayerPipe [default0]:stage=70 layers=3 [default0]: 72: ParallelTransformerLayerPipe [default0]: 73: undo [default0]: 74: MixedFusedLayerNorm [default0]:stage=71 layers=2 [default0]: 75: EmbeddingPipe [default0]: 76: float16_to_fp32 [default0]: loss: CrossEntropy [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default1]:Building extension module utils... [default1]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default1]:ninja: no work to do. [default1]:Loading extension module utils... [default0]:Loading extension module utils... [default2]:Loading extension module utils... [default3]:Loading extension module utils... [default4]:Loading extension module utils... [default5]:Loading extension module utils... [default6]:Loading extension module utils... [default7]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Time to load utils op: 0.3711378574371338 seconds [default0]:Time to load utils op: 0.37107229232788086 seconds [default2]:Time to load utils op: 0.3710789680480957 seconds [default3]:Time to load utils op: 0.3710780143737793 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Time to load utils op: 0.40547609329223633 seconds [default5]:Time to load utils op: 0.40554189682006836 seconds [default6]:Time to load utils op: 0.4051051139831543 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Time to load utils op: 0.4051492214202881 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default1]:Building extension module utils... [default1]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005450248718261719 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007152557373046875 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006492137908935547 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006973743438720703 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0010616779327392578 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0011124610900878906 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0011477470397949219 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0009458065032958984 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.11055350303649902 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.11052107810974121 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.11051702499389648 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.11049985885620117 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.11050128936767578 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.10878229141235352 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10648965835571289 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2101907730102539 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21019506454467773 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.11141276359558105 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.1089777946472168 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.1091303825378418 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.11787271499633789 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.1094045639038086 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21018362045288086 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2101907730102539 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10597968101501465 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10945820808410645 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.11785626411437988 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.109375 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.1178429126739502 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.11052894592285156 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.11051249504089355 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.11051440238952637 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.11051297187805176 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21030926704406738 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21030068397521973 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.21029448509216309 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21029901504516602 seconds [default4]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.10599279403686523 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.10533523559570312 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.10626745223999023 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20859670639038086 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.11783552169799805 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.10573530197143555 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20860791206359863 seconds [default0]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20972251892089844 seconds [default0]:Time to load utils op: 0.20972323417663574 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10292196273803711 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20860028266906738 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20876288414001465 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20858430862426758 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2085709571838379 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20856928825378418 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10466957092285156 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20876383781433105 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10549235343933105 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10515260696411133 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2097301483154297 seconds [default2]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10189032554626465 seconds [default2]:Time to load utils op: 0.2097187042236328 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20244956016540527 seconds [default4]:Loading extension module utils... [default1]:ninja: no work to do. [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.23596477508544922 seconds [default4]:Time to load utils op: 0.20842599868774414 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20844268798828125 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20242691040039062 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20876336097717285 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20843029022216797 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20877718925476074 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10466957092285156 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20856761932373047 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2082653045654297 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20826315879821777 seconds [default1]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10500359535217285 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20827126502990723 seconds [default1]:Time to load utils op: 0.2241060733795166 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20218420028686523 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2025918960571289 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2023153305053711 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10520052909851074 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10569953918457031 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.202467679977417 seconds [default4]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10227346420288086 seconds [default4]:Time to load utils op: 0.20230674743652344 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.1047358512878418 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2024400234222412 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2240920066833496 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.22409749031066895 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.22411155700683594 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20859384536743164 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.21808385848999023 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21808338165283203 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.11573290824890137 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2084369659423828 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2082657814025879 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.11603307723999023 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10190248489379883 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.11624264717102051 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.11577963829040527 seconds [default3]:Loading extension module utils... [default1]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2040712833404541 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2043778896331787 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20380592346191406 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2180795669555664 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21807074546813965 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2086641788482666 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2026529312133789 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2027127742767334 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20256471633911133 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20251965522766113 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.31859421730041504 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20250344276428223 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0027539730072021484 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20345664024353027 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:Time to load utils op: 0.0004775524139404297 seconds [default0]:Time to load utils op: 0.0006449222564697266 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20234394073486328 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20615267753601074 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2027273178100586 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004620552062988281 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004734992980957031 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21133637428283691 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0027391910552978516 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20285463333129883 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20285534858703613 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20249223709106445 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20296430587768555 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20929908752441406 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2025611400604248 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2083132266998291 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20853924751281738 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21135497093200684 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20272135734558105 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20249533653259277 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2082509994506836 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21132874488830566 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20363926887512207 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2091379165649414 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21356463432312012 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21357393264770508 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21358370780944824 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20871591567993164 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20874524116516113 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20245647430419922 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2087247371673584 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20257830619812012 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20873260498046875 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20266127586364746 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20276784896850586 seconds [default1]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3186023235321045 seconds [default1]:Time to load utils op: 0.3184797763824463 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20359492301940918 seconds [default4]:Time to load utils op: 0.20516395568847656 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2026057243347168 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20334959030151367 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20237278938293457 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20244622230529785 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20898032188415527 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2024393081665039 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20340943336486816 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20255494117736816 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006062984466552734 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21143555641174316 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20249629020690918 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20897698402404785 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21142935752868652 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2083749771118164 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20839262008666992 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2083742618560791 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006899833679199219 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20898151397705078 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20837998390197754 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20899176597595215 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3140695095062256 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004699230194091797 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006949901580810547 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2027568817138672 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20238971710205078 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20250344276428223 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2024550437927246 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20842981338500977 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20394682884216309 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20229434967041016 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20842242240905762 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005567073822021484 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.314119815826416 seconds [default5]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.31404662132263184 seconds [default5]:Time to load utils op: 0.21145200729370117 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2114570140838623 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20492935180664062 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20415496826171875 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2021350860595703 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.202742338180542 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3139324188232422 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006356239318847656 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20245957374572754 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20842599868774414 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004017353057861328 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.000659942626953125 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20244359970092773 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.203171968460083 seconds [default0]:[2022-09-05 14:18:26,727] [INFO] [utils.py:827:see_memory_usage] After Building Model [default0]:[2022-09-05 14:18:26,728] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-05 14:18:26,728] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.47 GB, percent = 7.2% [default0]:setting training iterations to 3100 [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.31017255783081055 seconds [default0]:> learning rate decay style: constant [default0]:DeepSpeed is enabled. [default0]:[2022-09-05 14:18:26,728] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.1+8b2a6371, git-hash=8b2a6371, git-branch=master [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006670951843261719 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3101637363433838 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20247149467468262 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2113204002380371 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20278048515319824 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20379018783569336 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006506443023681641 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2084341049194336 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2038273811340332 seconds [default3]:Time to load utils op: 0.0004477500915527344 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005404949188232422 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2091057300567627 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005362033843994141 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004291534423828125 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00045180320739746094 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2024376392364502 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20261669158935547 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004940032958984375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004146099090576172 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005509853363037109 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20259428024291992 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20242547988891602 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.204132080078125 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2024545669555664 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21091270446777344 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21091055870056152 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2068159580230713 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20637130737304688 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2179403305053711 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20872163772583008 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20873188972473145 seconds [default3]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2087113857269287 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2026979923248291 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20839524269104004 seconds [default3]:Time to load utils op: 0.21355223655700684 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2087082862854004 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20271801948547363 seconds [default4]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2135930061340332 seconds [default4]:Time to load utils op: 0.2135639190673828 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00061798095703125 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2026371955871582 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20638680458068848 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20617437362670898 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2061772346496582 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21091032028198242 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2109239101409912 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20252490043640137 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20254802703857422 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20253539085388184 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20836997032165527 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20791840553283691 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20583462715148926 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0003826618194580078 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00041413307189941406 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20867156982421875 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21794366836547852 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20578289031982422 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20262408256530762 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21794819831848145 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20258045196533203 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20857715606689453 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20250844955444336 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2025444507598877 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006005764007568359 seconds [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00048065185546875 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20241284370422363 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3102762699127197 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006079673767089844 seconds [default1]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2024672031402588 seconds [default5]:Loading extension module utils... [default1]:Time to load utils op: 0.2025442123413086 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2085723876953125 seconds [default5]:Time to load utils op: 0.20618224143981934 seconds [default5]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20856261253356934 seconds [default5]:Time to load utils op: 0.2085733413696289 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2062363624572754 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20247840881347656 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006768703460693359 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007030963897705078 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2025907039642334 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2025589942932129 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21793222427368164 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2063312530517578 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20264458656311035 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006353855133056641 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20249676704406738 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0003933906555175781 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0003662109375 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005927085876464844 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004405975341796875 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20250272750854492 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20257854461669922 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20254015922546387 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20234227180480957 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20238637924194336 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20806622505187988 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006346702575683594 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.000614166259765625 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20276808738708496 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20251870155334473 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20625090599060059 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006656646728515625 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00043320655822753906 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.202362060546875 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20236539840698242 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20229220390319824 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20772743225097656 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20787811279296875 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20260977745056152 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2025439739227295 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20238423347473145 seconds [default3]:Time to load utils op: 0.20866990089416504 seconds [default1]:Time to load utils op: 0.20866703987121582 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20250225067138672 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20250749588012695 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.31841111183166504 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005245208740234375 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.22501063346862793 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2080683708190918 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.22499561309814453 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007569789886474609 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20874595642089844 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.22502517700195312 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.22501897811889648 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20230746269226074 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007240772247314453 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004467964172363281 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0003781318664550781 seconds [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00069427490234375 seconds [default1]:Time to load utils op: 0.0005669593811035156 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004563331604003906 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000522613525390625 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006110668182373047 seconds [default3]:Time to load utils op: 0.0005321502685546875 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005943775177001953 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005884170532226562 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006737709045410156 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007290840148925781 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006511211395263672 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006792545318603516 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005340576171875 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006072521209716797 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00046253204345703125 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006515979766845703 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005457401275634766 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.000629425048828125 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0003368854522705078 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005118846893310547 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004820823669433594 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004181861877441406 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00047206878662109375 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007779598236083984 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00046825408935546875 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007243156433105469 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006704330444335938 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00037598609924316406 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default2]:Loading extension module utils... [default7]:Time to load utils op: 0.0007178783416748047 seconds [default2]:Time to load utils op: 0.00043654441833496094 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00043272972106933594 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00043892860412597656 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006833076477050781 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004885196685791016 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00045752525329589844 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006854534149169922 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005629062652587891 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007867813110351562 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004475116729736328 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00040650367736816406 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00044417381286621094 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006465911865234375 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00048470497131347656 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006563663482666016 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00046896934509277344 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.000469207763671875 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005767345428466797 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005495548248291016 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005273818969726562 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004258155822753906 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006923675537109375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004973411560058594 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004017353057861328 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008919239044189453 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004763603210449219 seconds [default1]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004341602325439453 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005121231079101562 seconds [default1]:Time to load utils op: 0.00043773651123046875 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00044536590576171875 seconds [default6]:Time to load utils op: 0.00042128562927246094 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004718303680419922 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008285045623779297 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007150173187255859 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Time to load utils op: 0.00072479248046875 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006978511810302734 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007498264312744141 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008573532104492188 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0003867149353027344 seconds [default2]:Time to load utils op: 0.0007336139678955078 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00046563148498535156 seconds [default6]:Time to load utils op: 0.0004830360412597656 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005333423614501953 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005285739898681641 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006422996520996094 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004999637603759766 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004839897155761719 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006098747253417969 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00045037269592285156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00041174888610839844 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007762908935546875 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00036215782165527344 seconds [default7]:Time to load utils op: 0.0005238056182861328 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005855560302734375 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00041675567626953125 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004520416259765625 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005764961242675781 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005137920379638672 seconds [default0]:Time to load utils op: 0.000518798828125 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005257129669189453 seconds [default3]:Time to load utils op: 0.0003986358642578125 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0003368854522705078 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00069427490234375 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004489421844482422 seconds [default4]:Time to load utils op: 0.00048804283142089844 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004975795745849609 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00048542022705078125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005464553833007812 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008006095886230469 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003724098205566406 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00043272972106933594 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0003933906555175781 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.000362396240234375 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00036597251892089844 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00038051605224609375 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005061626434326172 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.000759124755859375 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007634162902832031 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005464553833007812 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00046253204345703125 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0010867118835449219 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004296302795410156 seconds [default0]:Time to load utils op: 0.0008585453033447266 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Time to load utils op: 0.0010371208190917969 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009045600891113281 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005106925964355469 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006747245788574219 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default5]:Time to load utils op: 0.0003542900085449219 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005886554718017578 seconds [default7]:Time to load utils op: 0.00044918060302734375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006990432739257812 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006506443023681641 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00041294097900390625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005333423614501953 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00042366981506347656 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004947185516357422 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005612373352050781 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00034356117248535156 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004711151123046875 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005648136138916016 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004315376281738281 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00037598609924316406 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005695819854736328 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004355907440185547 seconds [default4]:Time to load utils op: 0.00043964385986328125 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004379749298095703 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004420280456542969 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004444122314453125 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Time to load utils op: 0.0003795623779296875 seconds [default4]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default4]:Time to load utils op: 0.0005526542663574219 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00034809112548828125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005784034729003906 seconds [default3]:Time to load utils op: 0.0007829666137695312 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004570484161376953 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0003743171691894531 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00043654441833496094 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.000507354736328125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00046539306640625 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005185604095458984 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00035119056701660156 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004642009735107422 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0003857612609863281 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004322528839111328 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00045609474182128906 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004379749298095703 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004165172576904297 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007784366607666016 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004851818084716797 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00047659873962402344 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0003781318664550781 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0009253025054931641 seconds [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default0]:Time to load utils op: 0.0008375644683837891 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00072479248046875 seconds [default7]:Time to load utils op: 0.0008454322814941406 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.000476837158203125 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00043845176696777344 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004558563232421875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006034374237060547 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006937980651855469 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006966590881347656 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005209445953369141 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00036072731018066406 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004563331604003906 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006268024444580078 seconds [default7]:Time to load utils op: 0.00045418739318847656 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006546974182128906 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0011241436004638672 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00047016143798828125 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.001275777816772461 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0009200572967529297 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005095005035400391 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004899501800537109 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007452964782714844 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006077289581298828 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006067752838134766 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007951259613037109 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007214546203613281 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006878376007080078 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007636547088623047 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006270408630371094 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006368160247802734 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00081634521484375 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0010066032409667969 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006773471832275391 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004916191101074219 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005936622619628906 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000759124755859375 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0009276866912841797 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.000728607177734375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009164810180664062 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007388591766357422 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007798671722412109 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005488395690917969 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005881786346435547 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007772445678710938 seconds [default0]:Time to load utils op: 0.0005655288696289062 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007579326629638672 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006120204925537109 seconds [default6]:Time to load utils op: 0.0006220340728759766 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0010752677917480469 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0010764598846435547 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005831718444824219 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006251335144042969 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006492137908935547 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008575916290283203 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0011730194091796875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008797645568847656 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008671283721923828 seconds [default7]:Time to load utils op: 0.0011587142944335938 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.001138448715209961 seconds [default6]:Time to load utils op: 0.0008733272552490234 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006794929504394531 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008490085601806641 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0011034011840820312 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.001016855239868164 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.001196146011352539 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008285045623779297 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-05 14:18:27,443] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [default0]:[2022-09-05 14:18:27,443] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer [default0]:[2022-09-05 14:18:27,443] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer [default0]:[2022-09-05 14:18:27,443] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = {basic_optimizer.__class__.__name__} [default0]:[2022-09-05 14:18:27,443] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer [default0]:[2022-09-05 14:18:27,468] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer [default0]:[2022-09-05 14:18:27,468] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-05 14:18:27,469] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.63 GB, percent = 7.3% [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default4]:Building extension module utils... [default4]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20316839218139648 seconds [default0]:[2022-09-05 14:18:27,696] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 [default0]:[2022-09-05 14:18:27,696] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-05 14:18:27,696] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.63 GB, percent = 7.3% [default4]:ninja: no work to do. [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.23857927322387695 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3077585697174072 seconds [default0]:[2022-09-05 14:18:27,747] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 [default0]:[2022-09-05 14:18:27,748] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:18:27,748] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.63 GB, percent = 7.3% [default0]:[2022-09-05 14:18:27,770] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 [default0]:[2022-09-05 14:18:27,771] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:18:27,771] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.63 GB, percent = 7.3% [default0]:[2022-09-05 14:18:27,794] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 [default0]:[2022-09-05 14:18:27,795] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:18:27,795] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.63 GB, percent = 7.3% [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005586147308349609 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30539703369140625 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3047487735748291 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3053865432739258 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30794358253479004 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3081035614013672 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0012822151184082031 seconds [default0]:[2022-09-05 14:18:27,817] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer [default0]:[2022-09-05 14:18:27,818] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:18:27,818] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.63 GB, percent = 7.3% [default0]:[2022-09-05 14:18:27,889] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer [default0]:[2022-09-05 14:18:27,889] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-05 14:18:27,890] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.63 GB, percent = 7.3% [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003752708435058594 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00045228004455566406 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004334449768066406 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0014023780822753906 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0016214847564697266 seconds [default0]:[2022-09-05 14:18:27,913] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer [default0]:[2022-09-05 14:18:27,913] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-05 14:18:27,913] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.63 GB, percent = 7.3% [default0]:[2022-09-05 14:18:27,913] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [default0]:[2022-09-05 14:18:27,913] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler [default0]:[2022-09-05 14:18:27,913] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [default0]:[2022-09-05 14:18:27,914] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05, 2e-05], mom=[(0.9, 0.95), (0.9, 0.95)] [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:987:print] DeepSpeedEngine configuration: [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] activation_checkpointing_config { [default0]: "partition_activations": false, [default0]: "contiguous_memory_optimization": false, [default0]: "cpu_checkpointing": false, [default0]: "number_checkpoints": null, [default0]: "synchronize_checkpoint_boundary": false, [default0]: "profile": false [default0]:} [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] amp_enabled .................. False [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] amp_params ................... False [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] autotuning_config ............ { [default0]: "enabled": false, [default0]: "start_step": null, [default0]: "end_step": null, [default0]: "metric_path": null, [default0]: "arg_mappings": null, [default0]: "metric": "throughput", [default0]: "model_info": null, [default0]: "results_dir": null, [default0]: "exps_dir": null, [default0]: "overwrite": true, [default0]: "fast": true, [default0]: "start_profile_step": 3, [default0]: "end_profile_step": 5, [default0]: "tuner_type": "gridsearch", [default0]: "tuner_early_stopping": 5, [default0]: "tuner_num_trials": 50, [default0]: "model_info_path": null, [default0]: "mp_size": 1, [default0]: "max_train_batch_size": null, [default0]: "min_train_batch_size": 1, [default0]: "max_train_micro_batch_size_per_gpu": 1.024000e+03, [default0]: "min_train_micro_batch_size_per_gpu": 1, [default0]: "num_tuning_micro_batch_sizes": 3 [default0]:} [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] bfloat16_enabled ............. True [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] checkpoint_tag_validation_enabled True [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] checkpoint_tag_validation_fail False [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] comms_config ................. [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] communication_data_type ...... None [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] curriculum_enabled ........... False [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] curriculum_params ............ False [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] dataloader_drop_last ......... False [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] disable_allgather ............ False [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] dump_state ................... False [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] dynamic_loss_scale_args ...... None [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] eigenvalue_enabled ........... False [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] eigenvalue_gas_boundary_resolution 1 [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] eigenvalue_layer_name ........ bert.encoder.layer [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] eigenvalue_layer_num ......... 0 [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] eigenvalue_max_iter .......... 100 [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] eigenvalue_stability ......... 1e-06 [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] eigenvalue_tol ............... 0.01 [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] eigenvalue_verbose ........... False [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] elasticity_enabled ........... False [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] flops_profiler_config ........ { [default0]: "enabled": false, [default0]: "profile_step": 1, [default0]: "module_depth": -1, [default0]: "top_modules": 1, [default0]: "detailed": true, [default0]: "output_file": null [default0]:} [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] fp16_auto_cast ............... None [default0]:[2022-09-05 14:18:27,914] [INFO] [config.py:991:print] fp16_enabled ................. False [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] fp16_master_weights_and_gradients False [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] global_rank .................. 0 [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] gradient_accumulation_steps .. 512 [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] gradient_clipping ............ 1.0 [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] gradient_predivide_factor .... 1.0 [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] initial_dynamic_scale ........ 1 [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] load_universal_checkpoint .... True [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] loss_scale ................... 1.0 [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] memory_breakdown ............. False [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] monitor_config ............... [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] nebula_config ................ { [default0]: "enabled": false, [default0]: "persistent_storage_path": null, [default0]: "persistent_time_interval": 100, [default0]: "num_of_version_in_retention": 2, [default0]: "enable_nebula_load": true, [default0]: "load_path": null [default0]:} [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] optimizer_legacy_fusion ...... False [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] optimizer_name ............... None [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] optimizer_params ............. None [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] pld_enabled .................. False [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] pld_params ................... False [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] prescale_gradients ........... False [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] scheduler_name ............... None [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] scheduler_params ............. None [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] sparse_attention ............. None [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] sparse_gradients_enabled ..... False [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] steps_per_print .............. 2000 [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] train_batch_size ............. 2048 [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] train_micro_batch_size_per_gpu 1 [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] wall_clock_breakdown ......... False [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] world_size ................... 4 [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] zero_allow_untested_optimizer False [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] zero_enabled ................. False [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:991:print] zero_optimization_stage ...... 0 [default0]:[2022-09-05 14:18:27,915] [INFO] [config.py:976:print_user_config] json = { [default0]: "train_micro_batch_size_per_gpu": 1, [default0]: "train_batch_size": 2.048000e+03, [default0]: "gradient_clipping": 1.0, [default0]: "zero_optimization": { [default0]: "stage": 0 [default0]: }, [default0]: "bf16": { [default0]: "enabled": true [default0]: }, [default0]: "steps_per_print": 2.000000e+03, [default0]: "wall_clock_breakdown": false, [default0]: "checkpoint": { [default0]: "load_universal": true [default0]: } [default0]:} [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004467964172363281 seconds [default0]:[2022-09-05 14:18:27,916] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=512 micro_batch_size=1 [default0]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=120 STAGE=30 LAYERS=1 [32, 33) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=4 STAGE=1 LAYERS=1 [3, 4) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=108 STAGE=27 LAYERS=1 [29, 30) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=104 STAGE=26 LAYERS=1 [28, 29) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=124 STAGE=31 LAYERS=1 [33, 34) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=8 STAGE=2 LAYERS=1 [4, 5) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=184 STAGE=46 LAYERS=1 [48, 49) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=232 STAGE=58 LAYERS=1 [60, 61) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=248 STAGE=62 LAYERS=1 [64, 65) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=88 STAGE=22 LAYERS=1 [24, 25) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=24 STAGE=6 LAYERS=1 [8, 9) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=236 STAGE=59 LAYERS=1 [61, 62) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=92 STAGE=23 LAYERS=1 [25, 26) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=152 STAGE=38 LAYERS=1 [40, 41) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=28 STAGE=7 LAYERS=1 [9, 10) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=12 STAGE=3 LAYERS=1 [5, 6) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=200 STAGE=50 LAYERS=1 [52, 53) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=244 STAGE=61 LAYERS=1 [63, 64) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=188 STAGE=47 LAYERS=1 [49, 50) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=192 STAGE=48 LAYERS=1 [50, 51) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=72 STAGE=18 LAYERS=1 [20, 21) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=204 STAGE=51 LAYERS=1 [53, 54) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=76 STAGE=19 LAYERS=1 [21, 22) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=252 STAGE=63 LAYERS=1 [65, 66) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=156 STAGE=39 LAYERS=1 [41, 42) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=80 STAGE=20 LAYERS=1 [22, 23) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=68 STAGE=17 LAYERS=1 [19, 20) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=128 STAGE=32 LAYERS=1 [34, 35) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=176 STAGE=44 LAYERS=1 [46, 47) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=132 STAGE=33 LAYERS=1 [35, 36) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=256 STAGE=64 LAYERS=1 [66, 67) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=180 STAGE=45 LAYERS=1 [47, 48) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=208 STAGE=52 LAYERS=1 [54, 55) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=60 STAGE=15 LAYERS=1 [17, 18) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=64 STAGE=16 LAYERS=1 [18, 19) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=40 STAGE=10 LAYERS=1 [12, 13) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=212 STAGE=53 LAYERS=1 [55, 56) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=168 STAGE=42 LAYERS=1 [44, 45) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=260 STAGE=65 LAYERS=1 [67, 68) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=48 STAGE=12 LAYERS=1 [14, 15) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=56 STAGE=14 LAYERS=1 [16, 17) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=3 [0, 3) STAGE_PARAMS=3596644352 (3596.644M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=268 STAGE=67 LAYERS=1 [69, 70) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=272 STAGE=68 LAYERS=1 [70, 71) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=240 STAGE=60 LAYERS=1 [62, 63) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,530] [INFO] [engine.py:145:__init__] RANK=84 STAGE=21 LAYERS=1 [23, 24) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=224 STAGE=56 LAYERS=1 [58, 59) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=172 STAGE=43 LAYERS=1 [45, 46) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=112 STAGE=28 LAYERS=1 [30, 31) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=136 STAGE=34 LAYERS=1 [36, 37) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=140 STAGE=35 LAYERS=1 [37, 38) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=52 STAGE=13 LAYERS=1 [15, 16) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=196 STAGE=49 LAYERS=1 [51, 52) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=32 STAGE=8 LAYERS=1 [10, 11) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=216 STAGE=54 LAYERS=1 [56, 57) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=16 STAGE=4 LAYERS=1 [6, 7) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=116 STAGE=29 LAYERS=1 [31, 32) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=36 STAGE=9 LAYERS=1 [11, 12) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=264 STAGE=66 LAYERS=1 [68, 69) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=220 STAGE=55 LAYERS=1 [57, 58) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=164 STAGE=41 LAYERS=1 [43, 44) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=228 STAGE=57 LAYERS=1 [59, 60) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=100 STAGE=25 LAYERS=1 [27, 28) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=96 STAGE=24 LAYERS=1 [26, 27) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=276 STAGE=69 LAYERS=1 [71, 72) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=160 STAGE=40 LAYERS=1 [42, 43) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=44 STAGE=11 LAYERS=1 [13, 14) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=280 STAGE=70 LAYERS=3 [72, 75) STAGE_PARAMS=2466465792 (2466.466M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=284 STAGE=71 LAYERS=2 [75, 77) STAGE_PARAMS=3596615680 (3596.616M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=144 STAGE=36 LAYERS=1 [38, 39) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,522] [INFO] [engine.py:145:__init__] RANK=148 STAGE=37 LAYERS=1 [39, 40) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:18:28,521] [INFO] [engine.py:145:__init__] RANK=20 STAGE=5 LAYERS=1 [7, 8) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default1]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:18:29,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:18:29,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:18:29,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:18:29,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:18:29,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:18:29,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:18:29,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:18:38,900] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 155 [default7]:[2022-09-05 14:18:38,995] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 103 [default3]:[2022-09-05 14:18:40,886] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 99 [default3]:[2022-09-05 14:18:41,159] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 67 [default3]:[2022-09-05 14:18:41,238] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 195 [default3]:[2022-09-05 14:18:41,320] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 227 [default7]:[2022-09-05 14:18:41,375] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 271 [default5]:[2022-09-05 14:18:41,995] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 101 [default4]:[2022-09-05 14:18:41,986] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 100 [default3]:[2022-09-05 14:18:42,194] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 203 [default7]:[2022-09-05 14:18:42,174] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 231 [default2]:[2022-09-05 14:18:42,498] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 154 [default3]:[2022-09-05 14:18:43,088] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 51 [default0]:[2022-09-05 14:18:43,097] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 96 [default1]:[2022-09-05 14:18:43,099] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 97 [default3]:[2022-09-05 14:18:43,090] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 251 [default5]:[2022-09-05 14:18:43,164] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 157 [default4]:[2022-09-05 14:18:43,167] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 156 [default3]:[2022-09-05 14:18:43,344] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 11 [default7]:[2022-09-05 14:18:43,420] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 247 [default7]:[2022-09-05 14:18:43,394] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 159 [default3]:[2022-09-05 14:18:43,601] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 131 [default3]:[2022-09-05 14:18:43,581] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 123 [default7]:[2022-09-05 14:18:43,657] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 135 [default6]:[2022-09-05 14:18:43,833] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 102 [default2]:[2022-09-05 14:18:44,007] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 98 [default3]:[2022-09-05 14:18:43,966] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 19 [default3]:[2022-09-05 14:18:44,175] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 235 [default7]:[2022-09-05 14:18:44,221] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 183 [default7]:[2022-09-05 14:18:44,218] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 79 [default7]:[2022-09-05 14:18:44,231] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 63 [default7]:[2022-09-05 14:18:44,323] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 55 [default3]:[2022-09-05 14:18:44,348] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 27 [default7]:[2022-09-05 14:18:44,404] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 199 [default7]:[2022-09-05 14:18:44,363] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 31 [default3]:[2022-09-05 14:18:44,358] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 243 [default7]:[2022-09-05 14:18:44,369] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 47 [default3]:[2022-09-05 14:18:44,525] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 91 [default3]:[2022-09-05 14:18:44,605] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 43 [default3]:[2022-09-05 14:18:44,672] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 83 [default7]:[2022-09-05 14:18:44,688] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 263 [default3]:[2022-09-05 14:18:44,651] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 147 [default7]:[2022-09-05 14:18:44,663] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 175 [default3]:[2022-09-05 14:18:44,731] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 179 [default4]:[2022-09-05 14:18:44,817] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 212 [default5]:[2022-09-05 14:18:44,818] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 213 [default2]:[2022-09-05 14:18:44,821] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 34 [default3]:[2022-09-05 14:18:44,828] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 35 [default3]:[2022-09-05 14:18:44,756] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 283 [default7]:[2022-09-05 14:18:44,849] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 39 [default0]:[2022-09-05 14:18:44,960] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 152 [default1]:[2022-09-05 14:18:44,960] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 153 [default3]:[2022-09-05 14:18:45,057] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 211 [default6]:[2022-09-05 14:18:45,174] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 182 [default3]:[2022-09-05 14:18:45,137] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 107 [default7]:[2022-09-05 14:18:45,275] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 87 [default2]:[2022-09-05 14:18:45,239] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 226 [default7]:[2022-09-05 14:18:45,329] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 223 [default7]:[2022-09-05 14:18:45,394] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 151 [default0]:[2022-09-05 14:18:45,446] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 208 [default7]:[2022-09-05 14:18:45,488] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 215 [default1]:[2022-09-05 14:18:45,457] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 209 [default2]:[2022-09-05 14:18:45,569] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 194 [default5]:[2022-09-05 14:18:45,555] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 117 [default4]:[2022-09-05 14:18:45,548] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 116 [default7]:[2022-09-05 14:18:45,806] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 15 [default7]:[2022-09-05 14:18:45,807] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 207 [default7]:[2022-09-05 14:18:45,826] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 167 [default3]:[2022-09-05 14:18:45,821] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 171 [default7]:[2022-09-05 14:18:45,818] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 127 [default7]:[2022-09-05 14:18:46,012] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 95 [default2]:[2022-09-05 14:18:46,025] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 66 [default3]:[2022-09-05 14:18:45,972] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 259 [default7]:[2022-09-05 14:18:45,935] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 239 [default6]:[2022-09-05 14:18:45,935] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 238 [default3]:[2022-09-05 14:18:45,974] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 219 [default6]:[2022-09-05 14:18:45,950] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 46 [default7]:[2022-09-05 14:18:45,943] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 23 [default5]:[2022-09-05 14:18:46,033] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 181 [default4]:[2022-09-05 14:18:46,035] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 180 [default5]:[2022-09-05 14:18:46,133] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 125 [default2]:[2022-09-05 14:18:46,046] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 218 [default5]:[2022-09-05 14:18:46,115] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 21 [default4]:[2022-09-05 14:18:46,104] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 20 [default4]:[2022-09-05 14:18:46,123] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 124 [default7]:[2022-09-05 14:18:46,201] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 71 [default3]:[2022-09-05 14:18:46,134] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 139 [default0]:[2022-09-05 14:18:46,151] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 32 [default2]:[2022-09-05 14:18:46,213] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 50 [default7]:[2022-09-05 14:18:46,166] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 119 [default1]:[2022-09-05 14:18:46,152] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 33 [default5]:[2022-09-05 14:18:46,278] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 13 [default4]:[2022-09-05 14:18:46,275] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 12 [default3]:[2022-09-05 14:18:46,282] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 163 [default3]:[2022-09-05 14:18:46,291] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 275 [default0]:[2022-09-05 14:18:46,410] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 120 [default1]:[2022-09-05 14:18:46,413] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 121 [default0]:[2022-09-05 14:18:46,493] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 80 [default3]:[2022-09-05 14:18:46,495] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 59 [default2]:[2022-09-05 14:18:46,489] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 58 [default1]:[2022-09-05 14:18:46,490] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 81 [default0]:[2022-09-05 14:18:46,498] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 16 [default0]:[2022-09-05 14:18:46,616] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 248 [default2]:[2022-09-05 14:18:46,592] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 10 [default1]:[2022-09-05 14:18:46,626] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 249 [default1]:[2022-09-05 14:18:46,548] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 17 [default7]:[2022-09-05 14:18:46,685] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 191 [default2]:[2022-09-05 14:18:46,646] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 202 [default3]:[2022-09-05 14:18:46,674] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 115 [default2]:[2022-09-05 14:18:46,670] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 114 [default4]:[2022-09-05 14:18:46,642] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 228 [default5]:[2022-09-05 14:18:46,653] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 229 [default5]:[2022-09-05 14:18:46,683] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 149 [default6]:[2022-09-05 14:18:46,672] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 270 [default7]:[2022-09-05 14:18:46,702] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 279 [default4]:[2022-09-05 14:18:46,675] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 148 [default5]:[2022-09-05 14:18:46,751] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 29 [default4]:[2022-09-05 14:18:46,746] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 28 [default1]:[2022-09-05 14:18:46,787] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 129 [default0]:[2022-09-05 14:18:46,787] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 128 [default4]:[2022-09-05 14:18:46,770] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 164 [default5]:[2022-09-05 14:18:46,779] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 165 [default4]:[2022-09-05 14:18:46,776] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 44 [default5]:[2022-09-05 14:18:46,778] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 45 [default1]:[2022-09-05 14:18:46,910] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 177 [default0]:[2022-09-05 14:18:46,909] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 176 [default2]:[2022-09-05 14:18:46,869] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 210 [default0]:[2022-09-05 14:18:46,866] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 216 [default1]:[2022-09-05 14:18:46,859] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 217 [default4]:[2022-09-05 14:18:46,870] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 276 [default5]:[2022-09-05 14:18:46,872] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 277 [default6]:[2022-09-05 14:18:46,905] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 110 [default7]:[2022-09-05 14:18:46,895] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 111 [default7]:[2022-09-05 14:18:46,941] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 143 [default0]:[2022-09-05 14:18:47,120] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 8 [default1]:[2022-09-05 14:18:47,107] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 9 [default2]:[2022-09-05 14:18:47,052] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 82 [default6]:[2022-09-05 14:18:47,038] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 158 [default6]:[2022-09-05 14:18:47,046] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 30 [default6]:[2022-09-05 14:18:47,102] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 198 [default4]:[2022-09-05 14:18:47,121] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 36 [default1]:[2022-09-05 14:18:47,126] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 113 [default0]:[2022-09-05 14:18:47,125] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 112 [default5]:[2022-09-05 14:18:47,128] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 37 [default5]:[2022-09-05 14:18:47,079] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 173 [default4]:[2022-09-05 14:18:47,074] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 172 [default7]:[2022-09-05 14:18:47,132] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 255 [default6]:[2022-09-05 14:18:47,069] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 230 [default1]:[2022-09-05 14:18:47,127] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 193 [default0]:[2022-09-05 14:18:47,136] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 192 [default6]:[2022-09-05 14:18:47,187] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 78 [default6]:[2022-09-05 14:18:47,231] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 62 [default2]:[2022-09-05 14:18:47,145] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 42 [default6]:[2022-09-05 14:18:47,204] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 134 [default4]:[2022-09-05 14:18:47,288] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 236 [default1]:[2022-09-05 14:18:47,302] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 145 [default1]:[2022-09-05 14:18:47,235] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 65 [default2]:[2022-09-05 14:18:47,320] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 178 [default0]:[2022-09-05 14:18:47,326] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 224 [default5]:[2022-09-05 14:18:47,288] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 237 [default2]:[2022-09-05 14:18:47,330] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 122 [default0]:[2022-09-05 14:18:47,239] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 64 [default1]:[2022-09-05 14:18:47,275] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 105 [default6]:[2022-09-05 14:18:47,314] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 174 [default0]:[2022-09-05 14:18:47,312] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 144 [default6]:[2022-09-05 14:18:47,254] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 118 [default0]:[2022-09-05 14:18:47,294] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 104 [default6]:[2022-09-05 14:18:47,423] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 14 [default3]:[2022-09-05 14:18:47,384] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 75 [default6]:[2022-09-05 14:18:47,382] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 214 [default1]:[2022-09-05 14:18:47,333] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 225 [default4]:[2022-09-05 14:18:47,429] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 220 [default0]:[2022-09-05 14:18:47,414] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 40 [default1]:[2022-09-05 14:18:47,432] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 41 [default5]:[2022-09-05 14:18:47,409] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 109 [default4]:[2022-09-05 14:18:47,405] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 108 [default6]:[2022-09-05 14:18:47,386] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 126 [default2]:[2022-09-05 14:18:47,366] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 18 [default5]:[2022-09-05 14:18:47,525] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 197 [default2]:[2022-09-05 14:18:47,503] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 146 [default4]:[2022-09-05 14:18:47,512] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 260 [default4]:[2022-09-05 14:18:47,528] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 196 [default0]:[2022-09-05 14:18:47,468] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 160 [default1]:[2022-09-05 14:18:47,469] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 161 [default5]:[2022-09-05 14:18:47,437] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 221 [default5]:[2022-09-05 14:18:47,532] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 261 [default1]:[2022-09-05 14:18:47,541] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 89 [default0]:[2022-09-05 14:18:47,549] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 88 [default0]:[2022-09-05 14:18:47,562] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 232 [default1]:[2022-09-05 14:18:47,572] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 233 [default2]:[2022-09-05 14:18:47,590] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 250 [default4]:[2022-09-05 14:18:47,559] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 68 [default5]:[2022-09-05 14:18:47,545] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 77 [default4]:[2022-09-05 14:18:47,547] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 76 [default5]:[2022-09-05 14:18:47,570] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 69 [default2]:[2022-09-05 14:18:47,569] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 130 [default5]:[2022-09-05 14:18:47,561] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 205 [default6]:[2022-09-05 14:18:47,549] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 262 [default4]:[2022-09-05 14:18:47,547] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 204 [default4]:[2022-09-05 14:18:47,629] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 60 [default6]:[2022-09-05 14:18:47,632] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 222 [default1]:[2022-09-05 14:18:47,588] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 281 [default2]:[2022-09-05 14:18:47,586] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 282 [default0]:[2022-09-05 14:18:47,581] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 280 [default2]:[2022-09-05 14:18:47,583] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 170 [default6]:[2022-09-05 14:18:47,590] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 22 [default6]:[2022-09-05 14:18:47,680] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 246 [default1]:[2022-09-05 14:18:47,698] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 201 [default0]:[2022-09-05 14:18:47,647] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 56 [default0]:[2022-09-05 14:18:47,698] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 200 [default6]:[2022-09-05 14:18:47,639] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 206 [default0]:[2022-09-05 14:18:47,727] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 240 [default5]:[2022-09-05 14:18:47,636] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 61 [default6]:[2022-09-05 14:18:47,692] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 86 [default5]:[2022-09-05 14:18:47,654] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 85 [default1]:[2022-09-05 14:18:47,642] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 57 [default4]:[2022-09-05 14:18:47,646] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 84 [default6]:[2022-09-05 14:18:47,736] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 38 [default0]:[2022-09-05 14:18:47,681] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 168 [default1]:[2022-09-05 14:18:47,686] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 169 [default6]:[2022-09-05 14:18:47,756] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 94 [default4]:[2022-09-05 14:18:47,749] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 252 [default1]:[2022-09-05 14:18:47,735] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 241 [default4]:[2022-09-05 14:18:47,763] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 132 [default5]:[2022-09-05 14:18:47,761] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 133 [default2]:[2022-09-05 14:18:47,734] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 258 [default5]:[2022-09-05 14:18:47,752] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 253 [default2]:[2022-09-05 14:18:47,787] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 106 [default0]:[2022-09-05 14:18:47,893] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 24 [default2]:[2022-09-05 14:18:47,826] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 234 [default1]:[2022-09-05 14:18:47,922] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 25 [default2]:[2022-09-05 14:18:47,858] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 90 [default3]:[2022-09-05 14:18:47,845] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 187 [default2]:[2022-09-05 14:18:47,860] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 242 [default0]:[2022-09-05 14:18:47,928] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 72 [default5]:[2022-09-05 14:18:47,908] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 245 [default2]:[2022-09-05 14:18:47,913] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 162 [default6]:[2022-09-05 14:18:47,915] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 166 [default6]:[2022-09-05 14:18:47,949] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 6 [default7]:[2022-09-05 14:18:47,914] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 7 [default4]:[2022-09-05 14:18:47,943] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 92 [default5]:[2022-09-05 14:18:47,945] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 93 [default2]:[2022-09-05 14:18:47,950] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 26 [default4]:[2022-09-05 14:18:47,940] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 244 [default6]:[2022-09-05 14:18:47,980] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 70 [default1]:[2022-09-05 14:18:47,932] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 73 [default1]:[2022-09-05 14:18:48,011] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 257 [default3]:[2022-09-05 14:18:47,981] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 267 [default0]:[2022-09-05 14:18:48,017] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 256 [default2]:[2022-09-05 14:18:47,986] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 274 [default6]:[2022-09-05 14:18:47,989] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 150 [default6]:[2022-09-05 14:18:47,993] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 254 [default2]:[2022-09-05 14:18:48,062] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 74 [default0]:[2022-09-05 14:18:48,114] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 272 [default4]:[2022-09-05 14:18:48,095] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 52 [default5]:[2022-09-05 14:18:48,092] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 53 [default6]:[2022-09-05 14:18:48,095] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 278 [default1]:[2022-09-05 14:18:48,116] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 273 [default4]:[2022-09-05 14:18:48,147] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 188 [default5]:[2022-09-05 14:18:48,162] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 189 [default6]:[2022-09-05 14:18:48,173] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 142 [default6]:[2022-09-05 14:18:48,155] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 54 [default4]:[2022-09-05 14:18:48,331] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 140 [default6]:[2022-09-05 14:18:48,259] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 190 [default2]:[2022-09-05 14:18:48,385] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 138 [default0]:[2022-09-05 14:18:48,355] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 264 [default5]:[2022-09-05 14:18:48,399] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 141 [default1]:[2022-09-05 14:18:48,415] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 49 [default1]:[2022-09-05 14:18:48,367] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 265 [default4]:[2022-09-05 14:18:48,530] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 268 [default5]:[2022-09-05 14:18:48,518] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 269 [default0]:[2022-09-05 14:18:48,562] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 48 [default0]:[2022-09-05 14:18:48,539] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 136 [default1]:[2022-09-05 14:18:48,550] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 137 [default4]:[2022-09-05 14:18:48,552] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 4 [default5]:[2022-09-05 14:18:48,585] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 5 [default2]:[2022-09-05 14:18:48,711] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 186 [default1]:[2022-09-05 14:18:48,724] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 185 [default0]:[2022-09-05 14:18:48,649] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 184 [default2]:[2022-09-05 14:18:48,689] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 266 [default6]:[2022-09-05 14:18:52,175] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 286 [default3]:[2022-09-05 14:18:53,216] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 3 [default1]:[2022-09-05 14:18:53,392] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 1 [default0]:[2022-09-05 14:18:53,392] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 0 [default0]: checkpoint version 3.0 [default7]:[2022-09-05 14:18:53,611] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 287 [default2]:[2022-09-05 14:18:53,588] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 2 [default4]:[2022-09-05 14:18:53,744] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 284 [default0]: successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq at iteration 95000 [default7]:time (ms) | load-checkpoint: 24297.87 [default5]:[2022-09-05 14:18:53,823] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 285 [default0]:estimated model parameters: 258.958393344 [default0]:estimated model parameters without embeddings: 0.002064384 [default0]:/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/utils.py:365: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings [default0]: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") [default0]:[after model, optimizer, and learning rate scheduler are built] datetime: 2022-09-05 14:18:53 [default0]:> building train, validation, and test datasets ... [default0]: > datasets target sizes (minimum size): [default0]: train: 6348800 [default0]: validation: 266240 [default0]: test: 20480 [default0]:> building train, validation, and test datasets for T0 ... [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.059482 seconds [default0]: number of documents: 90897616 [default0]: > dataset split: [default0]: train: [default0]: document indices in [0, 90897616) total of 90897616 documents [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.036106 seconds [default0]: number of documents: 90897616 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003572 seconds [default0]: number of documents: 90897616 [default0]: > loading doc-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_batch_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_shuffle_idx.npy [default0]: loaded indexed file in 0.019 seconds [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.175226 seconds [default0]: number of documents: 15234080 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [14472376, 15234080) total of 761704 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_8848ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_8848ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_8848ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.165 seconds [default0]: total number of samples: 221750 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.658868 seconds [default0]: number of documents: 6142390 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [5835270, 6142390) total of 307120 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_3009ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_3009ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_3009ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.094 seconds [default0]: total number of samples: 136143 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.282946 seconds [default0]: number of documents: 26176998 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [24868148, 26176998) total of 1308850 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_34858ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_34858ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_34858ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.154 seconds [default0]: total number of samples: 432311 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.293449 seconds [default0]: number of documents: 20844665 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [19802432, 20844665) total of 1042233 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_59324ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_59324ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_59324ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.139 seconds [default0]: total number of samples: 521545 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.599993 seconds [default0]: number of documents: 67005817 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [63655526, 67005817) total of 3350291 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_28545ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_28545ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_28545ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.171 seconds [default0]: total number of samples: 1740321 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.181138 seconds [default0]: number of documents: 5149795 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4892305, 5149795) total of 257490 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_418ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_418ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_418ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.070 seconds [default0]: total number of samples: 26370 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.508598 seconds [default0]: number of documents: 58847091 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [55904736, 58847091) total of 2942355 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_34929ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_34929ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_34929ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.397 seconds [default0]: total number of samples: 1458654 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.104457 seconds [default0]: number of documents: 12514253 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11888540, 12514253) total of 625713 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_2922ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_2922ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_2922ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.111 seconds [default0]: total number of samples: 134071 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.077856 seconds [default0]: number of documents: 180608 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [171578, 180608) total of 9030 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_30ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_30ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_30ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.056 seconds [default0]: total number of samples: 2501 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.168748 seconds [default0]: number of documents: 12303134 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11687977, 12303134) total of 615157 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_1470ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_1470ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_1470ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.057 seconds [default0]: total number of samples: 157244 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.202274 seconds [default0]: number of documents: 2033057 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1931404, 2033057) total of 101653 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_108ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_108ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_108ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.032 seconds [default0]: total number of samples: 20517 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.203485 seconds [default0]: number of documents: 26793553 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [25453875, 26793553) total of 1339678 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1999ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1999ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1999ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.070 seconds [default0]: total number of samples: 101502 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.188998 seconds [default0]: number of documents: 3155990 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2998190, 3155990) total of 157800 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_166ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_166ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_166ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.044 seconds [default0]: total number of samples: 44182 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.373805 seconds [default0]: number of documents: 6692522 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [6357896, 6692522) total of 334626 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_277ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_277ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_277ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.095 seconds [default0]: total number of samples: 47613 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.228903 seconds [default0]: number of documents: 3017261 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2866398, 3017261) total of 150863 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_135ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_135ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_135ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.145 seconds [default0]: total number of samples: 29298 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.261421 seconds [default0]: number of documents: 3648041 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [3465639, 3648041) total of 182402 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_179ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_179ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_179ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.009 seconds [default0]: total number of samples: 5659 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.153124 seconds [default0]: number of documents: 4327282 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4110918, 4327282) total of 216364 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_97ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_97ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_97ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.032 seconds [default0]: total number of samples: 12423 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.424171 seconds [default0]: number of documents: 2698896 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2563951, 2698896) total of 134945 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_137ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_137ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_137ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.025 seconds [default0]: total number of samples: 19133 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.176429 seconds [default0]: number of documents: 12767593 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [12129213, 12767593) total of 638380 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_566ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_566ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_566ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.072 seconds [default0]: total number of samples: 87928 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.343112 seconds [default0]: number of documents: 4342323 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4125207, 4342323) total of 217116 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_245ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_245ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_245ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.050 seconds [default0]: total number of samples: 69780 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.250616 seconds [default0]: number of documents: 3022722 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2871586, 3022722) total of 151136 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_334ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_334ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_334ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.044 seconds [default0]: total number of samples: 22532 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.128089 seconds [default0]: number of documents: 1162568 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1104440, 1162568) total of 58128 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_85ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_85ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_85ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.015 seconds [default0]: total number of samples: 1608 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.616881 seconds [default0]: number of documents: 55294645 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [52529913, 55294645) total of 2764732 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_21773ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_21773ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_21773ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.317 seconds [default0]: total number of samples: 690621 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.058724 seconds [default0]: number of documents: 44855616 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [42612835, 44855616) total of 2242781 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_14796ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_14796ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_14796ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.174 seconds [default0]: total number of samples: 468689 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.459566 seconds [default0]: number of documents: 31969891 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [30371396, 31969891) total of 1598495 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_13256ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_13256ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_13256ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.143 seconds [default0]: total number of samples: 497625 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.249634 seconds [default0]: number of documents: 34110375 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [32404856, 34110375) total of 1705519 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_6587ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_6587ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_6587ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.149 seconds [default0]: total number of samples: 125120 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.639457 seconds [default0]: number of documents: 43761623 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [41573542, 43761623) total of 2188081 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_32355ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_32355ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_32355ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.185 seconds [default0]: total number of samples: 1010592 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.041873 seconds [default0]: number of documents: 197602 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [187722, 197602) total of 9880 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.011 seconds [default0]: total number of samples: 4451 [default0]: total number of epochs: 1 [default0]:> building indices for blendable datasets ... [default0]: > sample ratios: [default0]: dataset 0, input: 0.0330676, achieved: 0.0330676 [default0]: dataset 1, input: 0.0112421, achieved: 0.0112421 [default0]: dataset 2, input: 0.130272, achieved: 0.130272 [default0]: dataset 3, input: 0.221712, achieved: 0.221712 [default0]: dataset 4, input: 0.106678, achieved: 0.106678 [default0]: dataset 5, input: 0.00155951, achieved: 0.00155955 [default0]: dataset 6, input: 0.13054, achieved: 0.13054 [default0]: dataset 7, input: 0.010918, achieved: 0.0109181 [default0]: dataset 8, input: 0.000110214, achieved: 0.000110257 [default0]: dataset 9, input: 0.00549238, achieved: 0.00549235 [default0]: dataset 10, input: 0.000402122, achieved: 0.000402094 [default0]: dataset 11, input: 0.00747007, achieved: 0.00747007 [default0]: dataset 12, input: 0.000619047, achieved: 0.000619024 [default0]: dataset 13, input: 0.00103353, achieved: 0.0010336 [default0]: dataset 14, input: 0.000501201, achieved: 0.000501226 [default0]: dataset 15, input: 0.000667277, achieved: 0.000667231 [default0]: dataset 16, input: 0.000359281, achieved: 0.000359326 [default0]: dataset 17, input: 0.000508443, achieved: 0.000508519 [default0]: dataset 18, input: 0.00211373, achieved: 0.0021138 [default0]: dataset 19, input: 0.000912995, achieved: 0.000912961 [default0]: dataset 20, input: 0.00124543, achieved: 0.00124546 [default0]: dataset 21, input: 0.000315887, achieved: 0.00031594 [default0]: dataset 22, input: 0.0813721, achieved: 0.0813721 [default0]: dataset 23, input: 0.0552939, achieved: 0.0552939 [default0]: dataset 24, input: 0.0495415, achieved: 0.0495414 [default0]: dataset 25, input: 0.0246164, achieved: 0.0246163 [default0]: dataset 26, input: 0.120917, achieved: 0.120917 [default0]: dataset 27, input: 0.000517703, achieved: 0.000517666 [default0]:> elapsed time for building blendable dataset indices: 0.33 (sec) [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.022662 seconds [default0]: number of documents: 2940097 [default0]: > dataset split: [default0]: valid: [default0]: document indices in [0, 2940097) total of 2940097 documents [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.010585 seconds [default0]: number of documents: 2940097 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.002638 seconds [default0]: number of documents: 2940097 [default0]: > WARNING: could not find index map files, building the indices on rank 0 ... [default0]:Skipping sample id=2746508. Maximum sequence length: 2049, sample length: 3712 [default0]:Skipping sample id=2498573. Maximum sequence length: 2049, sample length: 3115 [default0]:Skipping sample id=2730344. Maximum sequence length: 2049, sample length: 3811 [default0]:Skipping sample id=2750301. Maximum sequence length: 2049, sample length: 3819 [default0]:Skipping sample id=2731299. Maximum sequence length: 2049, sample length: 2986 [default0]:Skipping sample id=2714242. Maximum sequence length: 2049, sample length: 2969 [default0]:Skipping sample id=2713924. Maximum sequence length: 2049, sample length: 3505 [default0]:Skipping sample id=2747869. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2753271. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2711240. Maximum sequence length: 2049, sample length: 2965 [default0]:Skipping sample id=2744723. Maximum sequence length: 2049, sample length: 5812 [default0]:Skipping sample id=2479868. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2731521. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2495015. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2744436. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2752411. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2742577. Maximum sequence length: 2049, sample length: 3074 [default0]:Skipping sample id=2469193. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2752482. Maximum sequence length: 2049, sample length: 3024 [default0]:Skipping sample id=2742212. Maximum sequence length: 2049, sample length: 3349 [default0]:Skipping sample id=2734408. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2736213. Maximum sequence length: 2049, sample length: 2712 [default0]:Skipping sample id=2733415. Maximum sequence length: 2049, sample length: 4316 [default0]:Skipping sample id=2729647. Maximum sequence length: 2049, sample length: 3484 [default0]:Skipping sample id=2755482. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2738429. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2712050. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2751455. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2755865. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2733640. Maximum sequence length: 2049, sample length: 4361 [default0]:Skipping sample id=2734447. Maximum sequence length: 2049, sample length: 3508 [default0]:Skipping sample id=2756342. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2754189. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2730048. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2751878. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2738005. Maximum sequence length: 2049, sample length: 4864 [default0]:Skipping sample id=2743100. Maximum sequence length: 2049, sample length: 3129 [default0]:Skipping sample id=2713280. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2712593. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2723920. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2722898. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2725270. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2750860. Maximum sequence length: 2049, sample length: 3712 [default0]:Skipping sample id=2750574. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2736057. Maximum sequence length: 2049, sample length: 3308 [default0]:Skipping sample id=2469090. Maximum sequence length: 2049, sample length: 3322 [default0]:Skipping sample id=2717097. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2746315. Maximum sequence length: 2049, sample length: 3091 [default0]:Skipping sample id=2745382. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2754173. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2752874. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2725835. Maximum sequence length: 2049, sample length: 2644 [default0]:Skipping sample id=2721364. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2745381. Maximum sequence length: 2049, sample length: 4733 [default0]:Skipping sample id=2493897. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2718904. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2753262. Maximum sequence length: 2049, sample length: 2646 [default0]:Skipping sample id=2714107. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2737530. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2752402. Maximum sequence length: 2049, sample length: 3011 [default0]:Skipping sample id=2726763. Maximum sequence length: 2049, sample length: 2803 [default0]:Skipping sample id=2746971. Maximum sequence length: 2049, sample length: 3805 [default0]:Skipping sample id=2734931. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2739617. Maximum sequence length: 2049, sample length: 3610 [default0]:Skipping sample id=2711249. Maximum sequence length: 2049, sample length: 5201 [default0]:Skipping sample id=2732088. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2731765. Maximum sequence length: 2049, sample length: 5191 [default0]:Skipping sample id=2746246. Maximum sequence length: 2049, sample length: 4331 [default0]:Skipping sample id=2755992. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2746176. Maximum sequence length: 2049, sample length: 3577 [default0]:Skipping sample id=2747268. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2716871. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2719001. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2721873. Maximum sequence length: 2049, sample length: 4014 [default0]:Skipping sample id=2733691. Maximum sequence length: 2049, sample length: 5665 [default0]:Skipping sample id=2725866. Maximum sequence length: 2049, sample length: 3567 [default0]:Skipping sample id=2739505. Maximum sequence length: 2049, sample length: 5103 [default0]:Skipping sample id=2724619. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2487536. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2724911. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2734427. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2752875. Maximum sequence length: 2049, sample length: 3691 [default0]:Skipping sample id=2489826. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2724000. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2478125. Maximum sequence length: 2049, sample length: 3007 [default0]:Skipping sample id=2711255. Maximum sequence length: 2049, sample length: 5086 [default0]:Skipping sample id=2716183. Maximum sequence length: 2049, sample length: 3173 [default0]:Skipping sample id=2741528. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2712664. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2755045. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2486058. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2733686. Maximum sequence length: 2049, sample length: 2649 [default0]:Skipping sample id=2732603. Maximum sequence length: 2049, sample length: 4322 [default0]:Skipping sample id=2750972. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2715768. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2748551. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2470068. Maximum sequence length: 2049, sample length: 3674 [default0]:Skipping sample id=2752649. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2746093. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2744785. Maximum sequence length: 2049, sample length: 4249 [default0]:Skipping sample id=2489016. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2725618. Maximum sequence length: 2049, sample length: 2824 [default0]:Skipping sample id=2482421. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2755842. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2744431. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2467294. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2732906. Maximum sequence length: 2049, sample length: 2677 [default0]:Skipping sample id=2741281. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2719160. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2713897. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2719257. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2716249. Maximum sequence length: 2049, sample length: 2712 [default0]:Skipping sample id=2710979. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2734690. Maximum sequence length: 2049, sample length: 3504 [default0]:Skipping sample id=2488275. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2742821. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2720099. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2734547. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2725360. Maximum sequence length: 2049, sample length: 4311 [default0]:Skipping sample id=2478326. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2713784. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2497463. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2491575. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2752318. Maximum sequence length: 2049, sample length: 4361 [default0]:Skipping sample id=2724041. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2743099. Maximum sequence length: 2049, sample length: 3668 [default0]:Skipping sample id=2755037. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2721473. Maximum sequence length: 2049, sample length: 7336 [default0]:Skipping sample id=2720033. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2738900. Maximum sequence length: 2049, sample length: 3061 [default0]:Skipping sample id=2754346. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2722625. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2747667. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2755872. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2741834. Maximum sequence length: 2049, sample length: 4643 [default0]:Skipping sample id=2751052. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2739708. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2717021. Maximum sequence length: 2049, sample length: 3099 [default0]:Skipping sample id=2711663. Maximum sequence length: 2049, sample length: 4084 [default0]:Skipping sample id=2727631. Maximum sequence length: 2049, sample length: 2988 [default0]:Skipping sample id=2721033. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2747844. Maximum sequence length: 2049, sample length: 5140 [default0]:Skipping sample id=2716041. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2749654. Maximum sequence length: 2049, sample length: 5560 [default0]:Skipping sample id=2756437. Maximum sequence length: 2049, sample length: 2768 [default0]:Skipping sample id=2733702. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2735027. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2723513. Maximum sequence length: 2049, sample length: 3198 [default0]:Skipping sample id=2741387. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2746270. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2756435. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2735511. Maximum sequence length: 2049, sample length: 2865 [default0]:Skipping sample id=2748802. Maximum sequence length: 2049, sample length: 4206 [default0]:Skipping sample id=2716835. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2724192. Maximum sequence length: 2049, sample length: 2471 [default0]:Skipping sample id=2746200. Maximum sequence length: 2049, sample length: 3992 [default0]:Skipping sample id=2728036. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2714333. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2742804. Maximum sequence length: 2049, sample length: 14222 [default0]:Skipping sample id=2718936. Maximum sequence length: 2049, sample length: 4294 [default0]:Skipping sample id=2468935. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2712434. Maximum sequence length: 2049, sample length: 3290 [default0]:Skipping sample id=2749277. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2753119. Maximum sequence length: 2049, sample length: 4222 [default0]:Skipping sample id=2727559. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2745987. Maximum sequence length: 2049, sample length: 3572 [default0]:Skipping sample id=2756825. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2746239. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2746695. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2732577. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2731842. Maximum sequence length: 2049, sample length: 2714 [default0]:Skipping sample id=2726472. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2755820. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2749525. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2752291. Maximum sequence length: 2049, sample length: 4363 [default0]:Skipping sample id=2713473. Maximum sequence length: 2049, sample length: 3337 [default0]:Skipping sample id=2737713. Maximum sequence length: 2049, sample length: 3926 [default0]:Skipping sample id=2714772. Maximum sequence length: 2049, sample length: 2557 [default0]:Skipping sample id=2728792. Maximum sequence length: 2049, sample length: 3561 [default0]:Skipping sample id=2731375. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2715919. Maximum sequence length: 2049, sample length: 2950 [default0]:Skipping sample id=2743897. Maximum sequence length: 2049, sample length: 3297 [default0]:Skipping sample id=2739530. Maximum sequence length: 2049, sample length: 4011 [default0]:Skipping sample id=2740961. Maximum sequence length: 2049, sample length: 4002 [default0]:Skipping sample id=2467971. Maximum sequence length: 2049, sample length: 2737 [default0]:Skipping sample id=2726333. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2721807. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2719944. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2466897. Maximum sequence length: 2049, sample length: 2689 [default0]:Skipping sample id=2741291. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2711878. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2739494. Maximum sequence length: 2049, sample length: 2590 [default0]:Skipping sample id=2716243. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2711885. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2731379. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2744541. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2755330. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2718262. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2734950. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2753633. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2723753. Maximum sequence length: 2049, sample length: 3766 [default0]:Skipping sample id=2755671. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2717209. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2744874. Maximum sequence length: 2049, sample length: 3853 [default0]:Skipping sample id=2736498. Maximum sequence length: 2049, sample length: 2862 [default0]:Skipping sample id=2740560. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2721170. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2740400. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2748758. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2488096. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2714475. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2752343. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2725039. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2478547. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2725938. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2747338. Maximum sequence length: 2049, sample length: 3682 [default0]:Skipping sample id=2737078. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2749397. Maximum sequence length: 2049, sample length: 3263 [default0]:Skipping sample id=2711434. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2751562. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2726423. Maximum sequence length: 2049, sample length: 2963 [default0]:Skipping sample id=2726262. Maximum sequence length: 2049, sample length: 4221 [default0]:Skipping sample id=2714421. Maximum sequence length: 2049, sample length: 3589 [default0]:Skipping sample id=2728738. Maximum sequence length: 2049, sample length: 3695 [default0]:Skipping sample id=2720767. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2493206. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2723882. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2714018. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2737867. Maximum sequence length: 2049, sample length: 3107 [default0]:Skipping sample id=2728187. Maximum sequence length: 2049, sample length: 3177 [default0]:Skipping sample id=2737217. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2740818. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2720440. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2739938. Maximum sequence length: 2049, sample length: 2810 [default0]:Skipping sample id=2716197. Maximum sequence length: 2049, sample length: 4041 [default0]:Skipping sample id=2718729. Maximum sequence length: 2049, sample length: 4761 [default0]:Skipping sample id=2731164. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2719723. Maximum sequence length: 2049, sample length: 3119 [default0]:Skipping sample id=2733238. Maximum sequence length: 2049, sample length: 3312 [default0]:Skipping sample id=2749966. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2746151. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2755573. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2738283. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2748917. Maximum sequence length: 2049, sample length: 3308 [default0]:Skipping sample id=2711085. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2726385. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2754944. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2751501. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2737134. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2736970. Maximum sequence length: 2049, sample length: 4163 [default0]:Skipping sample id=2724661. Maximum sequence length: 2049, sample length: 2900 [default0]:Skipping sample id=2711158. Maximum sequence length: 2049, sample length: 4910 [default0]:Skipping sample id=2490005. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2732582. Maximum sequence length: 2049, sample length: 6428 [default0]:Skipping sample id=2715310. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2743660. Maximum sequence length: 2049, sample length: 3509 [default0]:Skipping sample id=2739633. Maximum sequence length: 2049, sample length: 2586 [default0]:Skipping sample id=2730071. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2722523. Maximum sequence length: 2049, sample length: 3589 [default0]:Skipping sample id=2748360. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2715433. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2495000. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2466495. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2743974. Maximum sequence length: 2049, sample length: 2809 [default0]:Skipping sample id=2736882. Maximum sequence length: 2049, sample length: 3151 [default0]:Skipping sample id=2731089. Maximum sequence length: 2049, sample length: 3549 [default0]:Skipping sample id=2748803. Maximum sequence length: 2049, sample length: 2434 [default0]:Skipping sample id=2725852. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2726238. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2736751. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2737303. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2715520. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2748419. Maximum sequence length: 2049, sample length: 2707 [default0]:Skipping sample id=2726459. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2490518. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2722331. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2469787. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2727266. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2722045. Maximum sequence length: 2049, sample length: 6941 [default0]:Skipping sample id=2729211. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2745005. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2734178. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2716281. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2743935. Maximum sequence length: 2049, sample length: 3546 [default0]:Skipping sample id=2733564. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2739826. Maximum sequence length: 2049, sample length: 5858 [default0]:Skipping sample id=2731954. Maximum sequence length: 2049, sample length: 3372 [default0]:Skipping sample id=2723781. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2740945. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2746763. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2719046. Maximum sequence length: 2049, sample length: 3999 [default0]:Skipping sample id=2731490. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2752917. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2728846. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2734978. Maximum sequence length: 2049, sample length: 5155 [default0]:Skipping sample id=2722545. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2746531. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2726665. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2479089. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2491119. Maximum sequence length: 2049, sample length: 3524 [default0]:Skipping sample id=2711195. Maximum sequence length: 2049, sample length: 4638 [default0]:Skipping sample id=2738158. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2485304. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2731495. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2741396. Maximum sequence length: 2049, sample length: 3789 [default0]:Skipping sample id=2487956. Maximum sequence length: 2049, sample length: 2744 [default0]:Skipping sample id=2716087. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2714541. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2725700. Maximum sequence length: 2049, sample length: 3586 [default0]:Skipping sample id=2748894. Maximum sequence length: 2049, sample length: 2669 [default0]:Skipping sample id=2743877. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2735138. Maximum sequence length: 2049, sample length: 4523 [default0]:Skipping sample id=2740267. Maximum sequence length: 2049, sample length: 4398 [default0]:Skipping sample id=2723599. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2724118. Maximum sequence length: 2049, sample length: 5380 [default0]:Skipping sample id=2713114. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2738034. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2723798. Maximum sequence length: 2049, sample length: 2999 [default0]:Skipping sample id=2747526. Maximum sequence length: 2049, sample length: 3324 [default0]:Skipping sample id=2721454. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2749172. Maximum sequence length: 2049, sample length: 3284 [default0]:Skipping sample id=2756777. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2715928. Maximum sequence length: 2049, sample length: 4567 [default0]:Skipping sample id=2750909. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2737223. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2735659. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2736693. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2716940. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2716534. Maximum sequence length: 2049, sample length: 6073 [default0]:Skipping sample id=2734173. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2738079. Maximum sequence length: 2049, sample length: 3230 [default0]:Skipping sample id=2740490. Maximum sequence length: 2049, sample length: 3572 [default0]:Skipping sample id=2755092. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2754538. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2717957. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2490293. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2746410. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2754330. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2735128. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2718299. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2746867. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2736262. Maximum sequence length: 2049, sample length: 3218 [default0]:Skipping sample id=2712285. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2736760. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2730632. Maximum sequence length: 2049, sample length: 3799 [default0]:Skipping sample id=2727295. Maximum sequence length: 2049, sample length: 4453 [default0]:Skipping sample id=2744513. Maximum sequence length: 2049, sample length: 3995 [default0]:Skipping sample id=2711486. Maximum sequence length: 2049, sample length: 5187 [default0]:Skipping sample id=2743696. Maximum sequence length: 2049, sample length: 4028 [default0]:Skipping sample id=2745759. Maximum sequence length: 2049, sample length: 3273 [default0]:Skipping sample id=2713780. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2731262. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2737368. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2743263. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2727806. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2724346. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2756583. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2749455. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2722827. Maximum sequence length: 2049, sample length: 3608 [default0]:Skipping sample id=2722082. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2712074. Maximum sequence length: 2049, sample length: 3750 [default0]:Skipping sample id=2749920. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2731576. Maximum sequence length: 2049, sample length: 4813 [default0]:Skipping sample id=2712017. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2720469. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2730955. Maximum sequence length: 2049, sample length: 4164 [default0]:Skipping sample id=2481119. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2718441. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2744088. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2713014. Maximum sequence length: 2049, sample length: 3408 [default0]:Skipping sample id=2726700. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2748380. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2716492. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2716735. Maximum sequence length: 2049, sample length: 3330 [default0]:Skipping sample id=2712897. Maximum sequence length: 2049, sample length: 4249 [default0]:Skipping sample id=2734371. Maximum sequence length: 2049, sample length: 2952 [default0]:Skipping sample id=2726752. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2725034. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2730988. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2715881. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2712119. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2715370. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2747867. Maximum sequence length: 2049, sample length: 3954 [default0]:Skipping sample id=2753225. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2741523. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2716413. Maximum sequence length: 2049, sample length: 5106 [default0]:Skipping sample id=2742153. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2738240. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2751731. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2753105. Maximum sequence length: 2049, sample length: 3137 [default0]:Skipping sample id=2732115. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2467110. Maximum sequence length: 2049, sample length: 3201 [default0]:Skipping sample id=2719515. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2753410. Maximum sequence length: 2049, sample length: 4420 [default0]:Skipping sample id=2752190. Maximum sequence length: 2049, sample length: 4704 [default0]:Skipping sample id=2735522. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2716695. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2727174. Maximum sequence length: 2049, sample length: 6924 [default0]:Skipping sample id=2720755. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2730271. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2716579. Maximum sequence length: 2049, sample length: 3113 [default0]:Skipping sample id=2477061. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2743507. Maximum sequence length: 2049, sample length: 3042 [default0]:Skipping sample id=2712947. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2744568. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2712214. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2722477. Maximum sequence length: 2049, sample length: 3009 [default0]:Skipping sample id=2479392. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2731753. Maximum sequence length: 2049, sample length: 3287 [default0]:Skipping sample id=2737831. Maximum sequence length: 2049, sample length: 7071 [default0]:Skipping sample id=2729059. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2724386. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2738846. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2749084. Maximum sequence length: 2049, sample length: 4258 [default0]:Skipping sample id=2490372. Maximum sequence length: 2049, sample length: 3441 [default0]:Skipping sample id=2746807. Maximum sequence length: 2049, sample length: 2120 [default0]:Skipping sample id=2737677. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2750341. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2752974. Maximum sequence length: 2049, sample length: 3120 [default0]:Skipping sample id=2479402. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2735039. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2724168. Maximum sequence length: 2049, sample length: 4098 [default0]:Skipping sample id=2751020. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2740223. Maximum sequence length: 2049, sample length: 3661 [default0]:Skipping sample id=2717641. Maximum sequence length: 2049, sample length: 4386 [default0]:Skipping sample id=2717682. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2477980. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2714369. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2723541. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2726311. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2732530. Maximum sequence length: 2049, sample length: 3437 [default0]:Skipping sample id=2724966. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2712245. Maximum sequence length: 2049, sample length: 3476 [default0]:Skipping sample id=2747273. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2495968. Maximum sequence length: 2049, sample length: 3468 [default0]:Skipping sample id=2743916. Maximum sequence length: 2049, sample length: 3293 [default0]:Skipping sample id=2745895. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2757101. Maximum sequence length: 2049, sample length: 4812 [default0]:Skipping sample id=2744222. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2736695. Maximum sequence length: 2049, sample length: 2709 [default0]:Skipping sample id=2756038. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2733677. Maximum sequence length: 2049, sample length: 3668 [default0]:Skipping sample id=2729808. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2716428. Maximum sequence length: 2049, sample length: 2968 [default0]:Skipping sample id=2718218. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2479194. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2752068. Maximum sequence length: 2049, sample length: 2757 [default0]:Skipping sample id=2740865. Maximum sequence length: 2049, sample length: 5861 [default0]:Skipping sample id=2733274. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2745891. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2722012. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2737850. Maximum sequence length: 2049, sample length: 4592 [default0]:Skipping sample id=2744439. Maximum sequence length: 2049, sample length: 3546 [default0]:Skipping sample id=2723932. Maximum sequence length: 2049, sample length: 6817 [default0]:Skipping sample id=2744103. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2744685. Maximum sequence length: 2049, sample length: 2515 [default0]:Skipping sample id=2469328. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2478257. Maximum sequence length: 2049, sample length: 3589 [default0]:Skipping sample id=2735944. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2720783. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2714504. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2721256. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2747228. Maximum sequence length: 2049, sample length: 4170 [default0]:Skipping sample id=2755736. Maximum sequence length: 2049, sample length: 3744 [default0]:Skipping sample id=2756377. Maximum sequence length: 2049, sample length: 2886 [default0]:Skipping sample id=2747561. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2739929. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2740116. Maximum sequence length: 2049, sample length: 3327 [default0]:Skipping sample id=2746743. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2711286. Maximum sequence length: 2049, sample length: 4514 [default0]:Skipping sample id=2725895. Maximum sequence length: 2049, sample length: 3180 [default0]:Skipping sample id=2716955. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2730602. Maximum sequence length: 2049, sample length: 3424 [default0]:Skipping sample id=2725173. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2742060. Maximum sequence length: 2049, sample length: 4356 [default0]:Skipping sample id=2483708. Maximum sequence length: 2049, sample length: 3895 [default0]:Skipping sample id=2730393. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2739226. Maximum sequence length: 2049, sample length: 4132 [default0]:Skipping sample id=2723440. Maximum sequence length: 2049, sample length: 4037 [default0]:Skipping sample id=2715088. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2743795. Maximum sequence length: 2049, sample length: 6215 [default0]:Skipping sample id=2495096. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2735448. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2712727. Maximum sequence length: 2049, sample length: 2861 [default0]:Skipping sample id=2488945. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2741118. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2712828. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2736005. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2745445. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2746441. Maximum sequence length: 2049, sample length: 4117 [default0]:Skipping sample id=2744969. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2731393. Maximum sequence length: 2049, sample length: 2421 [default0]:Skipping sample id=2752675. Maximum sequence length: 2049, sample length: 3974 [default0]:Skipping sample id=2714371. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2750822. Maximum sequence length: 2049, sample length: 3895 [default0]:Skipping sample id=2753060. Maximum sequence length: 2049, sample length: 3062 [default0]:Skipping sample id=2478632. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2720791. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2748338. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2726297. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2499391. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2725061. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2712344. Maximum sequence length: 2049, sample length: 4919 [default0]:Skipping sample id=2750201. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2728672. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2485497. Maximum sequence length: 2049, sample length: 3533 [default0]:Skipping sample id=2729316. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2488057. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2466937. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2718370. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2467128. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2741683. Maximum sequence length: 2049, sample length: 3493 [default0]:Skipping sample id=2752147. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2718721. Maximum sequence length: 2049, sample length: 3258 [default0]:Skipping sample id=2756101. Maximum sequence length: 2049, sample length: 2737 [default0]:Skipping sample id=2742486. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2724743. Maximum sequence length: 2049, sample length: 4431 [default0]:Skipping sample id=2720549. Maximum sequence length: 2049, sample length: 4601 [default0]:Skipping sample id=2726222. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2725855. Maximum sequence length: 2049, sample length: 5264 [default0]:Skipping sample id=2726182. Maximum sequence length: 2049, sample length: 3231 [default0]:Skipping sample id=2748366. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2743634. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2738134. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2727214. Maximum sequence length: 2049, sample length: 3475 [default0]:Skipping sample id=2734043. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2733230. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2745494. Maximum sequence length: 2049, sample length: 2808 [default0]:Skipping sample id=2741196. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2711174. Maximum sequence length: 2049, sample length: 3627 [default0]:Skipping sample id=2478694. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2743164. Maximum sequence length: 2049, sample length: 2506 [default0]:Skipping sample id=2725000. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2489295. Maximum sequence length: 2049, sample length: 2562 [default0]:Skipping sample id=2480436. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2750429. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2750468. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2744705. Maximum sequence length: 2049, sample length: 3465 [default0]:Skipping sample id=2725369. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2752538. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2737764. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2722575. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2721940. Maximum sequence length: 2049, sample length: 3741 [default0]:Skipping sample id=2734623. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2489511. Maximum sequence length: 2049, sample length: 2838 [default0]:Skipping sample id=2724150. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2730501. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2494721. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2755447. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2724521. Maximum sequence length: 2049, sample length: 6646 [default0]:Skipping sample id=2487535. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2743019. Maximum sequence length: 2049, sample length: 3388 [default0]:Skipping sample id=2750690. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2719737. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2728377. Maximum sequence length: 2049, sample length: 3028 [default0]:Skipping sample id=2484268. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2494655. Maximum sequence length: 2049, sample length: 2568 [default0]:Skipping sample id=2751596. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2717700. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2481864. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2745909. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2734641. Maximum sequence length: 2049, sample length: 4590 [default0]:Skipping sample id=2741151. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2716145. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2481031. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2751025. Maximum sequence length: 2049, sample length: 3446 [default0]:Skipping sample id=2726400. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2495956. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2477673. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2492275. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2727925. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2718382. Maximum sequence length: 2049, sample length: 3006 [default0]:Skipping sample id=2717435. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2468729. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2730455. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2468295. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2715989. Maximum sequence length: 2049, sample length: 2891 [default0]:Skipping sample id=2756623. Maximum sequence length: 2049, sample length: 4173 [default0]:Skipping sample id=2491422. Maximum sequence length: 2049, sample length: 3381 [default0]:Skipping sample id=2742380. Maximum sequence length: 2049, sample length: 3247 [default0]:Skipping sample id=2728722. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2752114. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2717352. Maximum sequence length: 2049, sample length: 5458 [default0]:Skipping sample id=2740298. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2721138. Maximum sequence length: 2049, sample length: 4893 [default0]:Skipping sample id=2755549. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2723828. Maximum sequence length: 2049, sample length: 3985 [default0]:Skipping sample id=2716370. Maximum sequence length: 2049, sample length: 3310 [default0]:Skipping sample id=2712032. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2734418. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2743290. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2738567. Maximum sequence length: 2049, sample length: 4028 [default0]:Skipping sample id=2494275. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2751796. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2495219. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2722416. Maximum sequence length: 2049, sample length: 3813 [default0]:Skipping sample id=2751005. Maximum sequence length: 2049, sample length: 4131 [default0]:Skipping sample id=2745672. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2731425. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2712739. Maximum sequence length: 2049, sample length: 6302 [default0]:Skipping sample id=2715530. Maximum sequence length: 2049, sample length: 3317 [default0]:Skipping sample id=2743036. Maximum sequence length: 2049, sample length: 4152 [default0]:Skipping sample id=2751323. Maximum sequence length: 2049, sample length: 4504 [default0]:Skipping sample id=2488813. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2754313. Maximum sequence length: 2049, sample length: 3743 [default0]:Skipping sample id=2483326. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2756975. Maximum sequence length: 2049, sample length: 6245 [default0]:Skipping sample id=2731835. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2754258. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2732695. Maximum sequence length: 2049, sample length: 2780 [default0]:Skipping sample id=2722472. Maximum sequence length: 2049, sample length: 3124 [default0]:Skipping sample id=2732544. Maximum sequence length: 2049, sample length: 3161 [default0]:Skipping sample id=2721782. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2750549. Maximum sequence length: 2049, sample length: 3902 [default0]:Skipping sample id=2493193. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2754100. Maximum sequence length: 2049, sample length: 3827 [default0]:Skipping sample id=2732616. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2727731. Maximum sequence length: 2049, sample length: 4909 [default0]:Skipping sample id=2751119. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2712553. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2746919. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2498780. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2712024. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2748589. Maximum sequence length: 2049, sample length: 2677 [default0]:Skipping sample id=2757052. Maximum sequence length: 2049, sample length: 4237 [default0]:Skipping sample id=2739519. Maximum sequence length: 2049, sample length: 4443 [default0]:Skipping sample id=2711671. Maximum sequence length: 2049, sample length: 3113 [default0]:Skipping sample id=2733377. Maximum sequence length: 2049, sample length: 3127 [default0]:Skipping sample id=2735862. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2496523. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2490587. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2732179. Maximum sequence length: 2049, sample length: 4074 [default0]:Skipping sample id=2492464. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2718993. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2712334. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2720772. Maximum sequence length: 2049, sample length: 3426 [default0]:Skipping sample id=2726764. Maximum sequence length: 2049, sample length: 3481 [default0]:Skipping sample id=2487470. Maximum sequence length: 2049, sample length: 3040 [default0]:Skipping sample id=2741614. Maximum sequence length: 2049, sample length: 2434 [default0]:Skipping sample id=2746571. Maximum sequence length: 2049, sample length: 3491 [default0]:Skipping sample id=2735283. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2724191. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2493755. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2742055. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2729740. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2498427. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2737868. Maximum sequence length: 2049, sample length: 5262 [default0]:Skipping sample id=2718679. Maximum sequence length: 2049, sample length: 2457 [default0]:Skipping sample id=2744611. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2485177. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2727247. Maximum sequence length: 2049, sample length: 5125 [default0]:Skipping sample id=2736341. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2748404. Maximum sequence length: 2049, sample length: 2798 [default0]:Skipping sample id=2753180. Maximum sequence length: 2049, sample length: 2765 [default0]:Skipping sample id=2468569. Maximum sequence length: 2049, sample length: 3419 [default0]:Skipping sample id=2483751. Maximum sequence length: 2049, sample length: 3613 [default0]:Skipping sample id=2725969. Maximum sequence length: 2049, sample length: 4218 [default0]:Skipping sample id=2750894. Maximum sequence length: 2049, sample length: 3084 [default0]:Skipping sample id=2717531. Maximum sequence length: 2049, sample length: 3060 [default0]:Skipping sample id=2752451. Maximum sequence length: 2049, sample length: 6528 [default0]:Skipping sample id=2727876. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2725177. Maximum sequence length: 2049, sample length: 3995 [default0]:Skipping sample id=2732635. Maximum sequence length: 2049, sample length: 2749 [default0]:Skipping sample id=2727776. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2732354. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2716388. Maximum sequence length: 2049, sample length: 4411 [default0]:Skipping sample id=2728624. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2718700. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2727023. Maximum sequence length: 2049, sample length: 3556 [default0]:Skipping sample id=2754569. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2718701. Maximum sequence length: 2049, sample length: 3407 [default0]:Skipping sample id=2735691. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2753256. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2738943. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2727982. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2754690. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2486263. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2737308. Maximum sequence length: 2049, sample length: 3993 [default0]:Skipping sample id=2746877. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2748527. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2741617. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2468273. Maximum sequence length: 2049, sample length: 2587 [default0]:Skipping sample id=2730895. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2718232. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2726074. Maximum sequence length: 2049, sample length: 3158 [default0]:Skipping sample id=2732599. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2485963. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2742956. Maximum sequence length: 2049, sample length: 3903 [default0]:Skipping sample id=2721430. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2719760. Maximum sequence length: 2049, sample length: 4084 [default0]:Skipping sample id=2479598. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2494755. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2719849. Maximum sequence length: 2049, sample length: 2860 [default0]:Skipping sample id=2753951. Maximum sequence length: 2049, sample length: 3647 [default0]:Skipping sample id=2750104. Maximum sequence length: 2049, sample length: 4605 [default0]:Skipping sample id=2716385. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2738226. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2481848. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2750380. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2757083. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2725228. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2712215. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2756774. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2726933. Maximum sequence length: 2049, sample length: 3587 [default0]:Skipping sample id=2745259. Maximum sequence length: 2049, sample length: 4290 [default0]:Skipping sample id=2722748. Maximum sequence length: 2049, sample length: 3581 [default0]:Skipping sample id=2720060. Maximum sequence length: 2049, sample length: 3436 [default0]:Skipping sample id=2716020. Maximum sequence length: 2049, sample length: 4382 [default0]:Skipping sample id=2745161. Maximum sequence length: 2049, sample length: 2695 [default0]:Skipping sample id=2717811. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2719360. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2728612. Maximum sequence length: 2049, sample length: 4948 [default0]:Skipping sample id=2755726. Maximum sequence length: 2049, sample length: 3209 [default0]:Skipping sample id=2496967. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2751520. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2729851. Maximum sequence length: 2049, sample length: 3343 [default0]:Skipping sample id=2722018. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2739434. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2744855. Maximum sequence length: 2049, sample length: 3140 [default0]:Skipping sample id=2724894. Maximum sequence length: 2049, sample length: 3818 [default0]:Skipping sample id=2711058. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2731859. Maximum sequence length: 2049, sample length: 3835 [default0]:Skipping sample id=2746854. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2466737. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2756611. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2488147. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2481619. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2721199. Maximum sequence length: 2049, sample length: 3266 [default0]:Skipping sample id=2741284. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2750529. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2477530. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2746097. Maximum sequence length: 2049, sample length: 5487 [default0]:Skipping sample id=2751936. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2749844. Maximum sequence length: 2049, sample length: 3531 [default0]:Skipping sample id=2724321. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2711470. Maximum sequence length: 2049, sample length: 4238 [default0]:Skipping sample id=2725705. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2731766. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2721098. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2747834. Maximum sequence length: 2049, sample length: 3652 [default0]:Skipping sample id=2733626. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2746751. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2726867. Maximum sequence length: 2049, sample length: 4249 [default0]:Skipping sample id=2720714. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2755686. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2736056. Maximum sequence length: 2049, sample length: 4062 [default0]:Skipping sample id=2734677. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2746862. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2725951. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2736123. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2719007. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2722166. Maximum sequence length: 2049, sample length: 4470 [default0]:Skipping sample id=2744288. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2747986. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2484447. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2741367. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2734810. Maximum sequence length: 2049, sample length: 4319 [default0]:Skipping sample id=2738651. Maximum sequence length: 2049, sample length: 4907 [default0]:Skipping sample id=2753319. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2732959. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2743863. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2755195. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2734018. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2731646. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2496288. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2736792. Maximum sequence length: 2049, sample length: 3825 [default0]:Skipping sample id=2491585. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2745932. Maximum sequence length: 2049, sample length: 5957 [default0]:Skipping sample id=2713285. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2717278. Maximum sequence length: 2049, sample length: 5291 [default0]:Skipping sample id=2721209. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2466882. Maximum sequence length: 2049, sample length: 3011 [default0]:Skipping sample id=2714700. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2493724. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2725040. Maximum sequence length: 2049, sample length: 3711 [default0]:Skipping sample id=2754226. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2711729. Maximum sequence length: 2049, sample length: 3507 [default0]:Skipping sample id=2713047. Maximum sequence length: 2049, sample length: 4182 [default0]:Skipping sample id=2719189. Maximum sequence length: 2049, sample length: 3192 [default0]:Skipping sample id=2754733. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2487391. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2467790. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2728435. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2715875. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2723940. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2734362. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2713232. Maximum sequence length: 2049, sample length: 2905 [default0]:Skipping sample id=2734701. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2714761. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2718927. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2714904. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2718913. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2736418. Maximum sequence length: 2049, sample length: 3573 [default0]:Skipping sample id=2490810. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2724696. Maximum sequence length: 2049, sample length: 3809 [default0]:Skipping sample id=2731138. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2718477. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2713748. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2721751. Maximum sequence length: 2049, sample length: 4285 [default0]:Skipping sample id=2737631. Maximum sequence length: 2049, sample length: 4934 [default0]:Skipping sample id=2716615. Maximum sequence length: 2049, sample length: 5187 [default0]:Skipping sample id=2719528. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2736603. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2733095. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2743541. Maximum sequence length: 2049, sample length: 3961 [default0]:Skipping sample id=2733748. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2752146. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2718847. Maximum sequence length: 2049, sample length: 4613 [default0]:Skipping sample id=2736520. Maximum sequence length: 2049, sample length: 4204 [default0]:Skipping sample id=2716274. Maximum sequence length: 2049, sample length: 5202 [default0]:Skipping sample id=2488043. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2748952. Maximum sequence length: 2049, sample length: 3874 [default0]:Skipping sample id=2722871. Maximum sequence length: 2049, sample length: 4570 [default0]:Skipping sample id=2745693. Maximum sequence length: 2049, sample length: 3888 [default0]:Skipping sample id=2754505. Maximum sequence length: 2049, sample length: 3271 [default0]:Skipping sample id=2722503. Maximum sequence length: 2049, sample length: 3483 [default0]:Skipping sample id=2745398. Maximum sequence length: 2049, sample length: 4198 [default0]:Skipping sample id=2736544. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2741900. Maximum sequence length: 2049, sample length: 5836 [default0]:Skipping sample id=2753585. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2727961. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2499417. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2732790. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2714125. Maximum sequence length: 2049, sample length: 4600 [default0]:Skipping sample id=2734745. Maximum sequence length: 2049, sample length: 4003 [default0]:Skipping sample id=2493618. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2744865. Maximum sequence length: 2049, sample length: 5076 [default0]:Skipping sample id=2754891. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2483482. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2715802. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2730721. Maximum sequence length: 2049, sample length: 2621 [default0]:Skipping sample id=2735816. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2481250. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2756472. Maximum sequence length: 2049, sample length: 3684 [default0]:Skipping sample id=2717090. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2719739. Maximum sequence length: 2049, sample length: 3736 [default0]:Skipping sample id=2728073. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2744766. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2732114. Maximum sequence length: 2049, sample length: 3469 [default0]:Skipping sample id=2732015. Maximum sequence length: 2049, sample length: 3282 [default0]:Skipping sample id=2744655. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2752138. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2717378. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2744531. Maximum sequence length: 2049, sample length: 5443 [default0]:Skipping sample id=2742549. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2498346. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2725380. Maximum sequence length: 2049, sample length: 3533 [default0]:Skipping sample id=2714093. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2751312. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2722627. Maximum sequence length: 2049, sample length: 3218 [default0]:Skipping sample id=2754764. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2719604. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2479250. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2737320. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2731761. Maximum sequence length: 2049, sample length: 2886 [default0]:Skipping sample id=2491112. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2468656. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2487619. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2738088. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2740927. Maximum sequence length: 2049, sample length: 2826 [default0]:Skipping sample id=2715218. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2746956. Maximum sequence length: 2049, sample length: 3573 [default0]:Skipping sample id=2725254. Maximum sequence length: 2049, sample length: 3213 [default0]:Skipping sample id=2711209. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2719916. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2744864. Maximum sequence length: 2049, sample length: 3376 [default0]:Skipping sample id=2741932. Maximum sequence length: 2049, sample length: 4699 [default0]:Skipping sample id=2732740. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2724453. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2744899. Maximum sequence length: 2049, sample length: 4521 [default0]:Skipping sample id=2710990. Maximum sequence length: 2049, sample length: 4345 [default0]:Skipping sample id=2484254. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2741319. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2741859. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2495692. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2738615. Maximum sequence length: 2049, sample length: 4819 [default0]:Skipping sample id=2714448. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2749801. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2468873. Maximum sequence length: 2049, sample length: 3335 [default0]:Skipping sample id=2481237. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2739962. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2721765. Maximum sequence length: 2049, sample length: 4698 [default0]:Skipping sample id=2714684. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2728273. Maximum sequence length: 2049, sample length: 3736 [default0]:Skipping sample id=2744009. Maximum sequence length: 2049, sample length: 4187 [default0]:Skipping sample id=2719042. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2732484. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2730820. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2749561. Maximum sequence length: 2049, sample length: 3391 [default0]:Skipping sample id=2716204. Maximum sequence length: 2049, sample length: 4782 [default0]:Skipping sample id=2736524. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2743821. Maximum sequence length: 2049, sample length: 4106 [default0]:Skipping sample id=2751767. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2721642. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2737802. Maximum sequence length: 2049, sample length: 3703 [default0]:Skipping sample id=2487605. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2731870. Maximum sequence length: 2049, sample length: 3788 [default0]:Skipping sample id=2721762. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2713671. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2733011. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2736692. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2715246. Maximum sequence length: 2049, sample length: 2565 [default0]:Skipping sample id=2466133. Maximum sequence length: 2049, sample length: 3166 [default0]:Skipping sample id=2734174. Maximum sequence length: 2049, sample length: 3487 [default0]:Skipping sample id=2482272. Maximum sequence length: 2049, sample length: 2906 [default0]:Skipping sample id=2750491. Maximum sequence length: 2049, sample length: 3108 [default0]:Skipping sample id=2746016. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2726488. Maximum sequence length: 2049, sample length: 4852 [default0]:Skipping sample id=2731529. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2719543. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2736309. Maximum sequence length: 2049, sample length: 2816 [default0]:Skipping sample id=2477464. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2752133. Maximum sequence length: 2049, sample length: 4789 [default0]:Skipping sample id=2749781. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2737909. Maximum sequence length: 2049, sample length: 3678 [default0]:Skipping sample id=2735462. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2490334. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2727198. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2753771. Maximum sequence length: 2049, sample length: 6009 [default0]:Skipping sample id=2724744. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2746117. Maximum sequence length: 2049, sample length: 3241 [default0]:Skipping sample id=2756881. Maximum sequence length: 2049, sample length: 2714 [default0]:Skipping sample id=2488278. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2735877. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2722409. Maximum sequence length: 2049, sample length: 5077 [default0]:Skipping sample id=2466492. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2726144. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2737814. Maximum sequence length: 2049, sample length: 3345 [default0]:Skipping sample id=2729430. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2744355. Maximum sequence length: 2049, sample length: 4133 [default0]:Skipping sample id=2731797. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2745734. Maximum sequence length: 2049, sample length: 3484 [default0]:Skipping sample id=2743579. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2486431. Maximum sequence length: 2049, sample length: 2703 [default0]:Skipping sample id=2730149. Maximum sequence length: 2049, sample length: 2906 [default0]:Skipping sample id=2755732. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2748452. Maximum sequence length: 2049, sample length: 4607 [default0]:Skipping sample id=2739073. Maximum sequence length: 2049, sample length: 3449 [default0]:Skipping sample id=2714995. Maximum sequence length: 2049, sample length: 2875 [default0]:Skipping sample id=2724729. Maximum sequence length: 2049, sample length: 3620 [default0]:Skipping sample id=2718914. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2744202. Maximum sequence length: 2049, sample length: 3450 [default0]:Skipping sample id=2470101. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2736569. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2712211. Maximum sequence length: 2049, sample length: 5346 [default0]:Skipping sample id=2718753. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2738075. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2725742. Maximum sequence length: 2049, sample length: 5400 [default0]:Skipping sample id=2743163. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2755316. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2750827. Maximum sequence length: 2049, sample length: 4538 [default0]:Skipping sample id=2715045. Maximum sequence length: 2049, sample length: 4589 [default0]:Skipping sample id=2480445. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2734450. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2719140. Maximum sequence length: 2049, sample length: 4370 [default0]:Skipping sample id=2722661. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2753890. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2747605. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2467083. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2716773. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2749854. Maximum sequence length: 2049, sample length: 4429 [default0]:Skipping sample id=2732331. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2483880. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2710971. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2731291. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2721354. Maximum sequence length: 2049, sample length: 3506 [default0]:Skipping sample id=2720532. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2748134. Maximum sequence length: 2049, sample length: 5334 [default0]:Skipping sample id=2744787. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2738442. Maximum sequence length: 2049, sample length: 4768 [default0]:Skipping sample id=2752158. Maximum sequence length: 2049, sample length: 5841 [default0]:Skipping sample id=2739139. Maximum sequence length: 2049, sample length: 3255 [default0]:Skipping sample id=2728654. Maximum sequence length: 2049, sample length: 4950 [default0]:Skipping sample id=2718830. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2489393. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2742775. Maximum sequence length: 2049, sample length: 2624 [default0]:Skipping sample id=2714260. Maximum sequence length: 2049, sample length: 3494 [default0]:Skipping sample id=2495955. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2737429. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2712341. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2483852. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2466786. Maximum sequence length: 2049, sample length: 4092 [default0]:Skipping sample id=2499111. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2719479. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2754183. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2752894. Maximum sequence length: 2049, sample length: 3523 [default0]:Skipping sample id=2718396. Maximum sequence length: 2049, sample length: 4960 [default0]:Skipping sample id=2729055. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2744314. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2712091. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2478428. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2737894. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2715854. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2713500. Maximum sequence length: 2049, sample length: 4245 [default0]:Skipping sample id=2734210. Maximum sequence length: 2049, sample length: 3022 [default0]:Skipping sample id=2487115. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2736133. Maximum sequence length: 2049, sample length: 2642 [default0]:Skipping sample id=2469751. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2715276. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2753715. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2751896. Maximum sequence length: 2049, sample length: 4129 [default0]:Skipping sample id=2719453. Maximum sequence length: 2049, sample length: 4186 [default0]:Skipping sample id=2749044. Maximum sequence length: 2049, sample length: 2860 [default0]:Skipping sample id=2721467. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2748800. Maximum sequence length: 2049, sample length: 4188 [default0]:Skipping sample id=2744580. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2735055. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2489841. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2734384. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2747018. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2754398. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2743440. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2753179. Maximum sequence length: 2049, sample length: 3433 [default0]:Skipping sample id=2484488. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2723093. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2714461. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2722613. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2726027. Maximum sequence length: 2049, sample length: 4369 [default0]:Skipping sample id=2748096. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2732038. Maximum sequence length: 2049, sample length: 5184 [default0]:Skipping sample id=2754086. Maximum sequence length: 2049, sample length: 3175 [default0]:Skipping sample id=2719443. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2715402. Maximum sequence length: 2049, sample length: 3819 [default0]:Skipping sample id=2739876. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2714322. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2731618. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2752003. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2727230. Maximum sequence length: 2049, sample length: 3765 [default0]:Skipping sample id=2720344. Maximum sequence length: 2049, sample length: 3434 [default0]:Skipping sample id=2740276. Maximum sequence length: 2049, sample length: 4038 [default0]:Skipping sample id=2756538. Maximum sequence length: 2049, sample length: 2982 [default0]:Skipping sample id=2712916. Maximum sequence length: 2049, sample length: 3156 [default0]:Skipping sample id=2720601. Maximum sequence length: 2049, sample length: 4501 [default0]:Skipping sample id=2478265. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2739380. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2478236. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2711784. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2736114. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2716338. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2745145. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2715722. Maximum sequence length: 2049, sample length: 2558 [default0]:Skipping sample id=2715856. Maximum sequence length: 2049, sample length: 3434 [default0]:Skipping sample id=2734636. Maximum sequence length: 2049, sample length: 3325 [default0]:Skipping sample id=2485544. Maximum sequence length: 2049, sample length: 2946 [default0]:Skipping sample id=2733761. Maximum sequence length: 2049, sample length: 3524 [default0]:Skipping sample id=2750290. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2713502. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2716369. Maximum sequence length: 2049, sample length: 2965 [default0]:Skipping sample id=2721889. Maximum sequence length: 2049, sample length: 4195 [default0]:Skipping sample id=2478252. Maximum sequence length: 2049, sample length: 2624 [default0]:Skipping sample id=2714123. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2752019. Maximum sequence length: 2049, sample length: 4119 [default0]:Skipping sample id=2725798. Maximum sequence length: 2049, sample length: 4079 [default0]:Skipping sample id=2466831. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2753700. Maximum sequence length: 2049, sample length: 3636 [default0]:Skipping sample id=2727966. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2754169. Maximum sequence length: 2049, sample length: 2580 [default0]:Skipping sample id=2720206. Maximum sequence length: 2049, sample length: 7785 [default0]:Skipping sample id=2736086. Maximum sequence length: 2049, sample length: 4013 [default0]:Skipping sample id=2718662. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2741733. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2721012. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2467292. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2497985. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2717990. Maximum sequence length: 2049, sample length: 3137 [default0]:Skipping sample id=2722981. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2727599. Maximum sequence length: 2049, sample length: 2868 [default0]:Skipping sample id=2498730. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2754009. Maximum sequence length: 2049, sample length: 3174 [default0]:Skipping sample id=2494626. Maximum sequence length: 2049, sample length: 2874 [default0]:Skipping sample id=2731829. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2725577. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2468307. Maximum sequence length: 2049, sample length: 3547 [default0]:Skipping sample id=2480994. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2729579. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2728886. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2491680. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2752090. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2722579. Maximum sequence length: 2049, sample length: 6499 [default0]:Skipping sample id=2726377. Maximum sequence length: 2049, sample length: 4335 [default0]:Skipping sample id=2715914. Maximum sequence length: 2049, sample length: 3697 [default0]:Skipping sample id=2750645. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2727682. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2753527. Maximum sequence length: 2049, sample length: 3335 [default0]:Skipping sample id=2493718. Maximum sequence length: 2049, sample length: 3115 [default0]:Skipping sample id=2746773. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2470030. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2466166. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2742278. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2467269. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2477830. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2470431. Maximum sequence length: 2049, sample length: 2757 [default0]:Skipping sample id=2714897. Maximum sequence length: 2049, sample length: 4155 [default0]:Skipping sample id=2717083. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2711647. Maximum sequence length: 2049, sample length: 2932 [default0]:Skipping sample id=2491068. Maximum sequence length: 2049, sample length: 2756 [default0]:Skipping sample id=2468863. Maximum sequence length: 2049, sample length: 3240 [default0]:Skipping sample id=2753787. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2483961. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2742005. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2477569. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2757032. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2731998. Maximum sequence length: 2049, sample length: 3452 [default0]:Skipping sample id=2752824. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2720569. Maximum sequence length: 2049, sample length: 3752 [default0]:Skipping sample id=2740968. Maximum sequence length: 2049, sample length: 5486 [default0]:Skipping sample id=2754202. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2734066. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2755020. Maximum sequence length: 2049, sample length: 2593 [default0]:Skipping sample id=2712231. Maximum sequence length: 2049, sample length: 5517 [default0]:Skipping sample id=2725542. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2466119. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2726955. Maximum sequence length: 2049, sample length: 3860 [default0]:Skipping sample id=2719381. Maximum sequence length: 2049, sample length: 4537 [default0]:Skipping sample id=2715598. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2724362. Maximum sequence length: 2049, sample length: 4579 [default0]:Skipping sample id=2711353. Maximum sequence length: 2049, sample length: 3868 [default0]:Skipping sample id=2747636. Maximum sequence length: 2049, sample length: 4315 [default0]:Skipping sample id=2482029. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2721706. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2718602. Maximum sequence length: 2049, sample length: 3337 [default0]:Skipping sample id=2467226. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2740573. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2725046. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2723900. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2712352. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2711433. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2711028. Maximum sequence length: 2049, sample length: 3023 [default0]:Skipping sample id=2732401. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2722330. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2497979. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2714634. Maximum sequence length: 2049, sample length: 3101 [default0]:Skipping sample id=2711457. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2496795. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2743483. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2742100. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2751470. Maximum sequence length: 2049, sample length: 2967 [default0]:Skipping sample id=2753970. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2714477. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2711765. Maximum sequence length: 2049, sample length: 2768 [default0]:Skipping sample id=2498177. Maximum sequence length: 2049, sample length: 3672 [default0]:Skipping sample id=2754830. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2730500. Maximum sequence length: 2049, sample length: 2506 [default0]:Skipping sample id=2750111. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2731570. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2715393. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2727392. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2752288. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2732376. Maximum sequence length: 2049, sample length: 5125 [default0]:Skipping sample id=2745031. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2717365. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2753482. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2721122. Maximum sequence length: 2049, sample length: 4288 [default0]:Skipping sample id=2737484. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2743134. Maximum sequence length: 2049, sample length: 4865 [default0]:Skipping sample id=2749992. Maximum sequence length: 2049, sample length: 5653 [default0]:Skipping sample id=2715323. Maximum sequence length: 2049, sample length: 5675 [default0]:Skipping sample id=2750495. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2723824. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2715522. Maximum sequence length: 2049, sample length: 4145 [default0]:Skipping sample id=2490527. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2735258. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2730562. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2722546. Maximum sequence length: 2049, sample length: 3041 [default0]:Skipping sample id=2712007. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2718270. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2744191. Maximum sequence length: 2049, sample length: 2854 [default0]:Skipping sample id=2726379. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2754156. Maximum sequence length: 2049, sample length: 4106 [default0]:Skipping sample id=2729144. Maximum sequence length: 2049, sample length: 5573 [default0]:Skipping sample id=2730201. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2736380. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2755093. Maximum sequence length: 2049, sample length: 3783 [default0]:Skipping sample id=2744506. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2489499. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2731026. Maximum sequence length: 2049, sample length: 2990 [default0]:Skipping sample id=2725219. Maximum sequence length: 2049, sample length: 4839 [default0]:Skipping sample id=2721257. Maximum sequence length: 2049, sample length: 4378 [default0]:Skipping sample id=2712347. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2732506. Maximum sequence length: 2049, sample length: 3179 [default0]:Skipping sample id=2482656. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2713933. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2747796. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2724187. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2494437. Maximum sequence length: 2049, sample length: 2232 [default0]:Skipping sample id=2492709. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2719462. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2496971. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2750153. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2723783. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2719051. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2496881. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2721233. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2720287. Maximum sequence length: 2049, sample length: 5984 [default0]:Skipping sample id=2736313. Maximum sequence length: 2049, sample length: 3706 [default0]:Skipping sample id=2723164. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2724733. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2714840. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2733993. Maximum sequence length: 2049, sample length: 3837 [default0]:Skipping sample id=2717340. Maximum sequence length: 2049, sample length: 5346 [default0]:Skipping sample id=2740690. Maximum sequence length: 2049, sample length: 4367 [default0]:Skipping sample id=2754887. Maximum sequence length: 2049, sample length: 2824 [default0]:Skipping sample id=2484650. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2749440. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2727984. Maximum sequence length: 2049, sample length: 3218 [default0]:Skipping sample id=2756084. Maximum sequence length: 2049, sample length: 3508 [default0]:Skipping sample id=2731395. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2756888. Maximum sequence length: 2049, sample length: 3333 [default0]:Skipping sample id=2737310. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2723556. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2711052. Maximum sequence length: 2049, sample length: 3089 [default0]:Skipping sample id=2723511. Maximum sequence length: 2049, sample length: 3013 [default0]:Skipping sample id=2492246. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2734628. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2714825. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2720726. Maximum sequence length: 2049, sample length: 5257 [default0]:Skipping sample id=2751260. Maximum sequence length: 2049, sample length: 3211 [default0]:Skipping sample id=2749176. Maximum sequence length: 2049, sample length: 3912 [default0]:Skipping sample id=2494329. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2711987. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2726518. Maximum sequence length: 2049, sample length: 4868 [default0]:Skipping sample id=2744768. Maximum sequence length: 2049, sample length: 8471 [default0]:Skipping sample id=2736746. Maximum sequence length: 2049, sample length: 4784 [default0]:Skipping sample id=2744738. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2723752. Maximum sequence length: 2049, sample length: 3401 [default0]:Skipping sample id=2739122. Maximum sequence length: 2049, sample length: 3805 [default0]:Skipping sample id=2719608. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2726712. Maximum sequence length: 2049, sample length: 4428 [default0]:Skipping sample id=2738688. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2725113. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2725885. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2495448. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2725684. Maximum sequence length: 2049, sample length: 4164 [default0]:Skipping sample id=2495644. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2717812. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2715770. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2751547. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2711692. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2717007. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2754526. Maximum sequence length: 2049, sample length: 3261 [default0]:Skipping sample id=2730306. Maximum sequence length: 2049, sample length: 3978 [default0]:Skipping sample id=2754935. Maximum sequence length: 2049, sample length: 2587 [default0]:Skipping sample id=2725698. Maximum sequence length: 2049, sample length: 3759 [default0]:Skipping sample id=2740478. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2752085. Maximum sequence length: 2049, sample length: 3572 [default0]:Skipping sample id=2719837. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2489000. Maximum sequence length: 2049, sample length: 3156 [default0]:Skipping sample id=2726045. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2753544. Maximum sequence length: 2049, sample length: 4276 [default0]:Skipping sample id=2711566. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2720954. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2729290. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2753716. Maximum sequence length: 2049, sample length: 2598 [default0]:Skipping sample id=2740627. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2733337. Maximum sequence length: 2049, sample length: 4004 [default0]:Skipping sample id=2715384. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2753836. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2738557. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2739008. Maximum sequence length: 2049, sample length: 2521 [default0]:Skipping sample id=2711226. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2711007. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2733534. Maximum sequence length: 2049, sample length: 2915 [default0]:Skipping sample id=2741935. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2720170. Maximum sequence length: 2049, sample length: 5231 [default0]:Skipping sample id=2482580. Maximum sequence length: 2049, sample length: 3391 [default0]:Skipping sample id=2745589. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2752578. Maximum sequence length: 2049, sample length: 3370 [default0]:Skipping sample id=2729595. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2478931. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2742940. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2466919. Maximum sequence length: 2049, sample length: 3635 [default0]:Skipping sample id=2726088. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2718786. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2748850. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2740393. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2497437. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2729540. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2494758. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2751199. Maximum sequence length: 2049, sample length: 2925 [default0]:Skipping sample id=2732827. Maximum sequence length: 2049, sample length: 2969 [default0]:Skipping sample id=2748763. Maximum sequence length: 2049, sample length: 2421 [default0]:Skipping sample id=2719063. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2754267. Maximum sequence length: 2049, sample length: 4192 [default0]:Skipping sample id=2750772. Maximum sequence length: 2049, sample length: 3486 [default0]:Skipping sample id=2753714. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2729950. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2753533. Maximum sequence length: 2049, sample length: 2796 [default0]:Skipping sample id=2734620. Maximum sequence length: 2049, sample length: 4562 [default0]:Skipping sample id=2718719. Maximum sequence length: 2049, sample length: 3980 [default0]:Skipping sample id=2722263. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2714273. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2727099. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2738748. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2719218. Maximum sequence length: 2049, sample length: 5817 [default0]:Skipping sample id=2753403. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2753392. Maximum sequence length: 2049, sample length: 4870 [default0]:Skipping sample id=2751172. Maximum sequence length: 2049, sample length: 3744 [default0]:Skipping sample id=2747911. Maximum sequence length: 2049, sample length: 4446 [default0]:Skipping sample id=2714930. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2495797. Maximum sequence length: 2049, sample length: 2803 [default0]:Skipping sample id=2723664. Maximum sequence length: 2049, sample length: 3004 [default0]:Skipping sample id=2743179. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2735088. Maximum sequence length: 2049, sample length: 2597 [default0]:Skipping sample id=2723962. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2750392. Maximum sequence length: 2049, sample length: 7513 [default0]:Skipping sample id=2753892. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2725156. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2755719. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2749226. Maximum sequence length: 2049, sample length: 6167 [default0]:Skipping sample id=2714367. Maximum sequence length: 2049, sample length: 2714 [default0]:Skipping sample id=2724250. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2724978. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2731566. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2745015. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2488733. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2485626. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2715948. Maximum sequence length: 2049, sample length: 2569 [default0]:Skipping sample id=2494245. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2752856. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2713844. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2717923. Maximum sequence length: 2049, sample length: 4542 [default0]:Skipping sample id=2727129. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2716837. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2748807. Maximum sequence length: 2049, sample length: 3958 [default0]:Skipping sample id=2749350. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2482506. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2717913. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2491819. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2741072. Maximum sequence length: 2049, sample length: 4234 [default0]:Skipping sample id=2749129. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2714079. Maximum sequence length: 2049, sample length: 3664 [default0]:Skipping sample id=2720013. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2754365. Maximum sequence length: 2049, sample length: 3838 [default0]:Skipping sample id=2725192. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2746201. Maximum sequence length: 2049, sample length: 4236 [default0]:Skipping sample id=2714451. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2744456. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2713128. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2751570. Maximum sequence length: 2049, sample length: 4123 [default0]:Skipping sample id=2749638. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2733083. Maximum sequence length: 2049, sample length: 3459 [default0]:Skipping sample id=2732319. Maximum sequence length: 2049, sample length: 3171 [default0]:Skipping sample id=2748566. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2715853. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2732022. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2734696. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2732358. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2468439. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2482245. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2482779. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2487808. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2736284. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2711284. Maximum sequence length: 2049, sample length: 5106 [default0]:Skipping sample id=2711293. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2731508. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2729759. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2717867. Maximum sequence length: 2049, sample length: 3073 [default0]:Skipping sample id=2729079. Maximum sequence length: 2049, sample length: 5115 [default0]:Skipping sample id=2721628. Maximum sequence length: 2049, sample length: 3210 [default0]:Skipping sample id=2730744. Maximum sequence length: 2049, sample length: 3430 [default0]:Skipping sample id=2714318. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2723839. Maximum sequence length: 2049, sample length: 4249 [default0]:Skipping sample id=2714468. Maximum sequence length: 2049, sample length: 3651 [default0]:Skipping sample id=2497966. Maximum sequence length: 2049, sample length: 2455 [default0]:Skipping sample id=2713900. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2754023. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2749269. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2469209. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2732605. Maximum sequence length: 2049, sample length: 2594 [default0]:Skipping sample id=2741695. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2744309. Maximum sequence length: 2049, sample length: 4114 [default0]:Skipping sample id=2478859. Maximum sequence length: 2049, sample length: 3302 [default0]:Skipping sample id=2482906. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2711247. Maximum sequence length: 2049, sample length: 4827 [default0]:Skipping sample id=2498143. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2735806. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2745355. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2750960. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2721421. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2471263. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2722171. Maximum sequence length: 2049, sample length: 3032 [default0]:Skipping sample id=2728835. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2723176. Maximum sequence length: 2049, sample length: 4941 [default0]:Skipping sample id=2741531. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2753816. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2733252. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2479555. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2755604. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2736303. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2718427. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2718471. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2712838. Maximum sequence length: 2049, sample length: 3484 [default0]:Skipping sample id=2726119. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2724007. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2738014. Maximum sequence length: 2049, sample length: 3085 [default0]:Skipping sample id=2721529. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2720463. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2736699. Maximum sequence length: 2049, sample length: 3074 [default0]:Skipping sample id=2742720. Maximum sequence length: 2049, sample length: 4689 [default0]:Skipping sample id=2720738. Maximum sequence length: 2049, sample length: 3913 [default0]:Skipping sample id=2721004. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2750367. Maximum sequence length: 2049, sample length: 3184 [default0]:Skipping sample id=2732197. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2741179. Maximum sequence length: 2049, sample length: 3764 [default0]:Skipping sample id=2728364. Maximum sequence length: 2049, sample length: 3269 [default0]:Skipping sample id=2748167. Maximum sequence length: 2049, sample length: 3687 [default0]:Skipping sample id=2727569. Maximum sequence length: 2049, sample length: 4080 [default0]:Skipping sample id=2747233. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2724656. Maximum sequence length: 2049, sample length: 3964 [default0]:Skipping sample id=2727560. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2718462. Maximum sequence length: 2049, sample length: 3981 [default0]:Skipping sample id=2741904. Maximum sequence length: 2049, sample length: 2855 [default0]:Skipping sample id=2726771. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2746116. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2737032. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2732847. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2719309. Maximum sequence length: 2049, sample length: 2406 [default0]:Skipping sample id=2732539. Maximum sequence length: 2049, sample length: 3267 [default0]:Skipping sample id=2753353. Maximum sequence length: 2049, sample length: 6262 [default0]:Skipping sample id=2490666. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2716147. Maximum sequence length: 2049, sample length: 5013 [default0]:Skipping sample id=2712405. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2713143. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2738550. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2715918. Maximum sequence length: 2049, sample length: 2486 [default0]:Skipping sample id=2731321. Maximum sequence length: 2049, sample length: 5350 [default0]:Skipping sample id=2485562. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2496141. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2748989. Maximum sequence length: 2049, sample length: 3539 [default0]:Skipping sample id=2740788. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2756505. Maximum sequence length: 2049, sample length: 2470 [default0]:Skipping sample id=2489406. Maximum sequence length: 2049, sample length: 3627 [default0]:Skipping sample id=2730317. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2737073. Maximum sequence length: 2049, sample length: 3092 [default0]:Skipping sample id=2735651. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2748328. Maximum sequence length: 2049, sample length: 3027 [default0]:Skipping sample id=2736985. Maximum sequence length: 2049, sample length: 6488 [default0]:Skipping sample id=2726338. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2754218. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2744120. Maximum sequence length: 2049, sample length: 4220 [default0]:Skipping sample id=2482743. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2722355. Maximum sequence length: 2049, sample length: 4075 [default0]:Skipping sample id=2711954. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2724033. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2711624. Maximum sequence length: 2049, sample length: 3680 [default0]:Skipping sample id=2753106. Maximum sequence length: 2049, sample length: 3615 [default0]:Skipping sample id=2726203. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2731872. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2741689. Maximum sequence length: 2049, sample length: 3337 [default0]:Skipping sample id=2726501. Maximum sequence length: 2049, sample length: 3445 [default0]:Skipping sample id=2753201. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2756416. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2715417. Maximum sequence length: 2049, sample length: 4806 [default0]:Skipping sample id=2728229. Maximum sequence length: 2049, sample length: 4900 [default0]:Skipping sample id=2731021. Maximum sequence length: 2049, sample length: 4223 [default0]:Skipping sample id=2726101. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2755811. Maximum sequence length: 2049, sample length: 4540 [default0]:Skipping sample id=2727388. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2734360. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2495691. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2721024. Maximum sequence length: 2049, sample length: 3192 [default0]:Skipping sample id=2737506. Maximum sequence length: 2049, sample length: 3024 [default0]:Skipping sample id=2715403. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2714757. Maximum sequence length: 2049, sample length: 3829 [default0]:Skipping sample id=2713373. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2730927. Maximum sequence length: 2049, sample length: 4062 [default0]:Skipping sample id=2490848. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2729994. Maximum sequence length: 2049, sample length: 4975 [default0]:Skipping sample id=2732730. Maximum sequence length: 2049, sample length: 3107 [default0]:Skipping sample id=2721796. Maximum sequence length: 2049, sample length: 3650 [default0]:Skipping sample id=2752011. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2727674. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2719901. Maximum sequence length: 2049, sample length: 2557 [default0]:Skipping sample id=2751243. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2739191. Maximum sequence length: 2049, sample length: 2841 [default0]:Skipping sample id=2751805. Maximum sequence length: 2049, sample length: 3340 [default0]:Skipping sample id=2749338. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2488102. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2726147. Maximum sequence length: 2049, sample length: 3884 [default0]:Skipping sample id=2468753. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2754900. Maximum sequence length: 2049, sample length: 2946 [default0]:Skipping sample id=2730467. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2722910. Maximum sequence length: 2049, sample length: 4809 [default0]:Skipping sample id=2724679. Maximum sequence length: 2049, sample length: 3596 [default0]:Skipping sample id=2734040. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2733956. Maximum sequence length: 2049, sample length: 2766 [default0]:Skipping sample id=2753887. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2754444. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2743010. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2732327. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2747901. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2725792. Maximum sequence length: 2049, sample length: 3934 [default0]:Skipping sample id=2733320. Maximum sequence length: 2049, sample length: 3208 [default0]:Skipping sample id=2721018. Maximum sequence length: 2049, sample length: 3877 [default0]:Skipping sample id=2746657. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2757012. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2721336. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2468632. Maximum sequence length: 2049, sample length: 3532 [default0]:Skipping sample id=2716635. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2744698. Maximum sequence length: 2049, sample length: 3288 [default0]:Skipping sample id=2713524. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2732243. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2728970. Maximum sequence length: 2049, sample length: 3100 [default0]:Skipping sample id=2711733. Maximum sequence length: 2049, sample length: 4510 [default0]:Skipping sample id=2717797. Maximum sequence length: 2049, sample length: 3715 [default0]:Skipping sample id=2752247. Maximum sequence length: 2049, sample length: 7153 [default0]:Skipping sample id=2749895. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2734665. Maximum sequence length: 2049, sample length: 3935 [default0]:Skipping sample id=2742960. Maximum sequence length: 2049, sample length: 7070 [default0]:Skipping sample id=2727962. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2753656. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2728123. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2744478. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2742993. Maximum sequence length: 2049, sample length: 2902 [default0]:Skipping sample id=2729222. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2746931. Maximum sequence length: 2049, sample length: 2896 [default0]:Skipping sample id=2713728. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2491196. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2719248. Maximum sequence length: 2049, sample length: 8234 [default0]:Skipping sample id=2747584. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2753746. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2730504. Maximum sequence length: 2049, sample length: 2899 [default0]:Skipping sample id=2724950. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2746056. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2742408. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2741037. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2721093. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2738328. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2755211. Maximum sequence length: 2049, sample length: 3361 [default0]:Skipping sample id=2494228. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2749405. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2742399. Maximum sequence length: 2049, sample length: 5841 [default0]:Skipping sample id=2744671. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2727888. Maximum sequence length: 2049, sample length: 2894 [default0]:Skipping sample id=2727248. Maximum sequence length: 2049, sample length: 3150 [default0]:Skipping sample id=2711290. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2482343. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2744077. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2729761. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2726673. Maximum sequence length: 2049, sample length: 4504 [default0]:Skipping sample id=2489160. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2723758. Maximum sequence length: 2049, sample length: 4252 [default0]:Skipping sample id=2741707. Maximum sequence length: 2049, sample length: 3076 [default0]:Skipping sample id=2744666. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2735444. Maximum sequence length: 2049, sample length: 2846 [default0]:Skipping sample id=2745173. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2714796. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2727202. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2477580. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2711863. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2735618. Maximum sequence length: 2049, sample length: 4294 [default0]:Skipping sample id=2723576. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2754812. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2739402. Maximum sequence length: 2049, sample length: 5442 [default0]:Skipping sample id=2495305. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2728573. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2713992. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2725646. Maximum sequence length: 2049, sample length: 6438 [default0]:Skipping sample id=2714450. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2741330. Maximum sequence length: 2049, sample length: 6153 [default0]:Skipping sample id=2738877. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2729926. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2726462. Maximum sequence length: 2049, sample length: 4772 [default0]:Skipping sample id=2495560. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2716636. Maximum sequence length: 2049, sample length: 4691 [default0]:Skipping sample id=2726343. Maximum sequence length: 2049, sample length: 3208 [default0]:Skipping sample id=2716711. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2729645. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2720871. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2735400. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2753218. Maximum sequence length: 2049, sample length: 7077 [default0]:Skipping sample id=2712937. Maximum sequence length: 2049, sample length: 3209 [default0]:Skipping sample id=2730489. Maximum sequence length: 2049, sample length: 4294 [default0]:Skipping sample id=2718861. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2720898. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2745143. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2737137. Maximum sequence length: 2049, sample length: 3958 [default0]:Skipping sample id=2717554. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2714652. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2745192. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2737288. Maximum sequence length: 2049, sample length: 3967 [default0]:Skipping sample id=2470145. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2747345. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2492264. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2719080. Maximum sequence length: 2049, sample length: 4589 [default0]:Skipping sample id=2714353. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2490620. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2486336. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2721114. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2725522. Maximum sequence length: 2049, sample length: 3152 [default0]:Skipping sample id=2494879. Maximum sequence length: 2049, sample length: 3263 [default0]:Skipping sample id=2725639. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2722633. Maximum sequence length: 2049, sample length: 4003 [default0]:Skipping sample id=2717226. Maximum sequence length: 2049, sample length: 4068 [default0]:Skipping sample id=2715065. Maximum sequence length: 2049, sample length: 3969 [default0]:Skipping sample id=2734877. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2747633. Maximum sequence length: 2049, sample length: 5087 [default0]:Skipping sample id=2747385. Maximum sequence length: 2049, sample length: 3304 [default0]:Skipping sample id=2737504. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2731332. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2741512. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2468507. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2752268. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2718158. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2726750. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2735363. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2720486. Maximum sequence length: 2049, sample length: 3440 [default0]:Skipping sample id=2723447. Maximum sequence length: 2049, sample length: 6498 [default0]:Skipping sample id=2466229. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2722544. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2736227. Maximum sequence length: 2049, sample length: 4201 [default0]:Skipping sample id=2747202. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2719147. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2714174. Maximum sequence length: 2049, sample length: 3515 [default0]:Skipping sample id=2751728. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2734527. Maximum sequence length: 2049, sample length: 4360 [default0]:Skipping sample id=2735390. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2747920. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2749583. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2750943. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2722290. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2711764. Maximum sequence length: 2049, sample length: 2678 [default0]:Skipping sample id=2723929. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2723135. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2724086. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2748731. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2716546. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2744853. Maximum sequence length: 2049, sample length: 3408 [default0]:Skipping sample id=2726227. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2725098. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2712192. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2724904. Maximum sequence length: 2049, sample length: 4029 [default0]:Skipping sample id=2742099. Maximum sequence length: 2049, sample length: 4163 [default0]:Skipping sample id=2750371. Maximum sequence length: 2049, sample length: 3570 [default0]:Skipping sample id=2752861. Maximum sequence length: 2049, sample length: 2942 [default0]:Skipping sample id=2465832. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2718215. Maximum sequence length: 2049, sample length: 3681 [default0]:Skipping sample id=2738544. Maximum sequence length: 2049, sample length: 4198 [default0]:Skipping sample id=2752255. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2732523. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2719649. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2746169. Maximum sequence length: 2049, sample length: 5338 [default0]:Skipping sample id=2724657. Maximum sequence length: 2049, sample length: 2709 [default0]:Skipping sample id=2747415. Maximum sequence length: 2049, sample length: 3037 [default0]:Skipping sample id=2746375. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2495402. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2723841. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2742997. Maximum sequence length: 2049, sample length: 5262 [default0]:Skipping sample id=2736238. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2743105. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2719736. Maximum sequence length: 2049, sample length: 4985 [default0]:Skipping sample id=2720481. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2731792. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2752148. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2734464. Maximum sequence length: 2049, sample length: 3858 [default0]:Skipping sample id=2715784. Maximum sequence length: 2049, sample length: 4496 [default0]:Skipping sample id=2750684. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2740997. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2732841. Maximum sequence length: 2049, sample length: 3791 [default0]:Skipping sample id=2470055. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2742087. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2746108. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2486999. Maximum sequence length: 2049, sample length: 4285 [default0]:Skipping sample id=2714127. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2749604. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2744097. Maximum sequence length: 2049, sample length: 6158 [default0]:Skipping sample id=2753592. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2746114. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2736422. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2717413. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2750423. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2742934. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2752448. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2726327. Maximum sequence length: 2049, sample length: 5064 [default0]:Skipping sample id=2729328. Maximum sequence length: 2049, sample length: 3716 [default0]:Skipping sample id=2711428. Maximum sequence length: 2049, sample length: 3289 [default0]:Skipping sample id=2747528. Maximum sequence length: 2049, sample length: 5848 [default0]:Skipping sample id=2498932. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2752105. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2749557. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2734896. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2485738. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2716648. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2723889. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2735928. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2493679. Maximum sequence length: 2049, sample length: 2902 [default0]:Skipping sample id=2712513. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2720625. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2717526. Maximum sequence length: 2049, sample length: 2490 [default0]:Skipping sample id=2752544. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2754868. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2732267. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2752358. Maximum sequence length: 2049, sample length: 2521 [default0]:Skipping sample id=2735177. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2735498. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2732857. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2751237. Maximum sequence length: 2049, sample length: 2567 [default0]:Skipping sample id=2750963. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2741639. Maximum sequence length: 2049, sample length: 3378 [default0]:Skipping sample id=2744367. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2482798. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2469759. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2725151. Maximum sequence length: 2049, sample length: 6633 [default0]:Skipping sample id=2735777. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2753451. Maximum sequence length: 2049, sample length: 3564 [default0]:Skipping sample id=2737646. Maximum sequence length: 2049, sample length: 2624 [default0]:Skipping sample id=2732738. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2739648. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2713137. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2714430. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2732990. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2726060. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2741988. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2729025. Maximum sequence length: 2049, sample length: 2963 [default0]:Skipping sample id=2715152. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2729337. Maximum sequence length: 2049, sample length: 2671 [default0]:Skipping sample id=2728753. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2717486. Maximum sequence length: 2049, sample length: 3236 [default0]:Skipping sample id=2720609. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2749621. Maximum sequence length: 2049, sample length: 4502 [default0]:Skipping sample id=2755206. Maximum sequence length: 2049, sample length: 3433 [default0]:Skipping sample id=2739333. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2717372. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2717696. Maximum sequence length: 2049, sample length: 3469 [default0]:Skipping sample id=2746962. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2718880. Maximum sequence length: 2049, sample length: 5688 [default0]:Skipping sample id=2733411. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2721063. Maximum sequence length: 2049, sample length: 5360 [default0]:Skipping sample id=2729375. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2725920. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2756442. Maximum sequence length: 2049, sample length: 2784 [default0]:Skipping sample id=2736999. Maximum sequence length: 2049, sample length: 3653 [default0]:Skipping sample id=2728178. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2752100. Maximum sequence length: 2049, sample length: 3803 [default0]:Skipping sample id=2730073. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2754657. Maximum sequence length: 2049, sample length: 3724 [default0]:Skipping sample id=2491246. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2737979. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2712782. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2711840. Maximum sequence length: 2049, sample length: 3006 [default0]:Skipping sample id=2716627. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2722244. Maximum sequence length: 2049, sample length: 5188 [default0]:Skipping sample id=2725503. Maximum sequence length: 2049, sample length: 3816 [default0]:Skipping sample id=2747379. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2483611. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2752246. Maximum sequence length: 2049, sample length: 2900 [default0]:Skipping sample id=2755431. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2749373. Maximum sequence length: 2049, sample length: 6060 [default0]:Skipping sample id=2727968. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2712899. Maximum sequence length: 2049, sample length: 4326 [default0]:Skipping sample id=2468414. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2494012. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2748707. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2483756. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2728515. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2734340. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2741710. Maximum sequence length: 2049, sample length: 5866 [default0]:Skipping sample id=2721901. Maximum sequence length: 2049, sample length: 4175 [default0]:Skipping sample id=2717513. Maximum sequence length: 2049, sample length: 3765 [default0]:Skipping sample id=2496548. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2469306. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2715426. Maximum sequence length: 2049, sample length: 4157 [default0]:Skipping sample id=2718513. Maximum sequence length: 2049, sample length: 4473 [default0]:Skipping sample id=2728884. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2737586. Maximum sequence length: 2049, sample length: 5133 [default0]:Skipping sample id=2718079. Maximum sequence length: 2049, sample length: 5439 [default0]:Skipping sample id=2721324. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2713477. Maximum sequence length: 2049, sample length: 3847 [default0]:Skipping sample id=2718923. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2734812. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2717809. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2730954. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2718506. Maximum sequence length: 2049, sample length: 4000 [default0]:Skipping sample id=2734021. Maximum sequence length: 2049, sample length: 3260 [default0]:Skipping sample id=2751789. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2735303. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2715173. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2714692. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2748241. Maximum sequence length: 2049, sample length: 5494 [default0]:Skipping sample id=2713235. Maximum sequence length: 2049, sample length: 4441 [default0]:Skipping sample id=2740025. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2730779. Maximum sequence length: 2049, sample length: 3457 [default0]:Skipping sample id=2752467. Maximum sequence length: 2049, sample length: 3680 [default0]:Skipping sample id=2713320. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2742989. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2727355. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2752919. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2712900. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2732691. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2754699. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2723356. Maximum sequence length: 2049, sample length: 3061 [default0]:Skipping sample id=2716165. Maximum sequence length: 2049, sample length: 2973 [default0]:Skipping sample id=2740657. Maximum sequence length: 2049, sample length: 2490 [default0]:Skipping sample id=2734968. Maximum sequence length: 2049, sample length: 6628 [default0]:Skipping sample id=2735910. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2716631. Maximum sequence length: 2049, sample length: 2672 [default0]:Skipping sample id=2742280. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2730023. Maximum sequence length: 2049, sample length: 3801 [default0]:Skipping sample id=2713225. Maximum sequence length: 2049, sample length: 4958 [default0]:Skipping sample id=2721050. Maximum sequence length: 2049, sample length: 2919 [default0]:Skipping sample id=2712536. Maximum sequence length: 2049, sample length: 2695 [default0]:Skipping sample id=2744575. Maximum sequence length: 2049, sample length: 3316 [default0]:Skipping sample id=2478230. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2749320. Maximum sequence length: 2049, sample length: 2861 [default0]:Skipping sample id=2747690. Maximum sequence length: 2049, sample length: 6767 [default0]:Skipping sample id=2468790. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2723990. Maximum sequence length: 2049, sample length: 2468 [default0]:Skipping sample id=2714893. Maximum sequence length: 2049, sample length: 6626 [default0]:Skipping sample id=2716174. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2754531. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2745651. Maximum sequence length: 2049, sample length: 2597 [default0]:Skipping sample id=2751583. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2755938. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2718864. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2737674. Maximum sequence length: 2049, sample length: 3326 [default0]:Skipping sample id=2721770. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2716248. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2752836. Maximum sequence length: 2049, sample length: 4040 [default0]:Skipping sample id=2725554. Maximum sequence length: 2049, sample length: 2983 [default0]:Skipping sample id=2740384. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2724466. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2477472. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2711000. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2732020. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2736912. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2755163. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2718116. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2754867. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2751253. Maximum sequence length: 2049, sample length: 4369 [default0]:Skipping sample id=2727448. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2747564. Maximum sequence length: 2049, sample length: 3226 [default0]:Skipping sample id=2736508. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2752162. Maximum sequence length: 2049, sample length: 2860 [default0]:Skipping sample id=2716605. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2739624. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2746802. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2479422. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2714201. Maximum sequence length: 2049, sample length: 4703 [default0]:Skipping sample id=2714099. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2724094. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2738410. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2717598. Maximum sequence length: 2049, sample length: 4772 [default0]:Skipping sample id=2753765. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2741167. Maximum sequence length: 2049, sample length: 3943 [default0]:Skipping sample id=2739794. Maximum sequence length: 2049, sample length: 3700 [default0]:Skipping sample id=2733322. Maximum sequence length: 2049, sample length: 4112 [default0]:Skipping sample id=2737774. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2725403. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2484115. Maximum sequence length: 2049, sample length: 3275 [default0]:Skipping sample id=2726790. Maximum sequence length: 2049, sample length: 4420 [default0]:Skipping sample id=2738108. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2726252. Maximum sequence length: 2049, sample length: 3584 [default0]:Skipping sample id=2713175. Maximum sequence length: 2049, sample length: 5675 [default0]:Skipping sample id=2722189. Maximum sequence length: 2049, sample length: 3764 [default0]:Skipping sample id=2478438. Maximum sequence length: 2049, sample length: 2602 [default0]:Skipping sample id=2494608. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2731622. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2727812. Maximum sequence length: 2049, sample length: 3159 [default0]:Skipping sample id=2716084. Maximum sequence length: 2049, sample length: 2949 [default0]:Skipping sample id=2731203. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2730792. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2740991. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2471062. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2753062. Maximum sequence length: 2049, sample length: 14228 [default0]:Skipping sample id=2471043. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2733921. Maximum sequence length: 2049, sample length: 3561 [default0]:Skipping sample id=2725814. Maximum sequence length: 2049, sample length: 3446 [default0]:Skipping sample id=2728876. Maximum sequence length: 2049, sample length: 3186 [default0]:Skipping sample id=2729867. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2745172. Maximum sequence length: 2049, sample length: 3813 [default0]:Skipping sample id=2486685. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2715334. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2747341. Maximum sequence length: 2049, sample length: 3356 [default0]:Skipping sample id=2490277. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2743990. Maximum sequence length: 2049, sample length: 3366 [default0]:Skipping sample id=2718315. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2726346. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2732822. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2751694. Maximum sequence length: 2049, sample length: 3132 [default0]:Skipping sample id=2752647. Maximum sequence length: 2049, sample length: 3829 [default0]:Skipping sample id=2731741. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2741826. Maximum sequence length: 2049, sample length: 2994 [default0]:Skipping sample id=2730697. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2753156. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2746074. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2724140. Maximum sequence length: 2049, sample length: 4677 [default0]:Skipping sample id=2729269. Maximum sequence length: 2049, sample length: 4660 [default0]:Skipping sample id=2724916. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2733354. Maximum sequence length: 2049, sample length: 3474 [default0]:Skipping sample id=2756953. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2731827. Maximum sequence length: 2049, sample length: 2798 [default0]:Skipping sample id=2725437. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2746386. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2477571. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2750347. Maximum sequence length: 2049, sample length: 2950 [default0]:Skipping sample id=2718291. Maximum sequence length: 2049, sample length: 2676 [default0]:Skipping sample id=2727394. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2749565. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2733220. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2728632. Maximum sequence length: 2049, sample length: 3524 [default0]:Skipping sample id=2754815. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2744672. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2745224. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2496678. Maximum sequence length: 2049, sample length: 3417 [default0]:Skipping sample id=2731511. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2745572. Maximum sequence length: 2049, sample length: 5058 [default0]:Skipping sample id=2741579. Maximum sequence length: 2049, sample length: 5234 [default0]:Skipping sample id=2720479. Maximum sequence length: 2049, sample length: 4532 [default0]:Skipping sample id=2724056. Maximum sequence length: 2049, sample length: 3231 [default0]:Skipping sample id=2714617. Maximum sequence length: 2049, sample length: 6614 [default0]:Skipping sample id=2731268. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2471100. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2715828. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2716108. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2466517. Maximum sequence length: 2049, sample length: 2876 [default0]:Skipping sample id=2714180. Maximum sequence length: 2049, sample length: 2406 [default0]:Skipping sample id=2470007. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2721022. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2752707. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2746815. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2731630. Maximum sequence length: 2049, sample length: 3445 [default0]:Skipping sample id=2727139. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2468431. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2731647. Maximum sequence length: 2049, sample length: 4868 [default0]:Skipping sample id=2727712. Maximum sequence length: 2049, sample length: 3791 [default0]:Skipping sample id=2743091. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2754176. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2715026. Maximum sequence length: 2049, sample length: 3588 [default0]:Skipping sample id=2742276. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2729913. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2734248. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2748330. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2738072. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2744011. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2496904. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2715083. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2483008. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2739674. Maximum sequence length: 2049, sample length: 3090 [default0]:Skipping sample id=2490281. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2749216. Maximum sequence length: 2049, sample length: 3497 [default0]:Skipping sample id=2479507. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2734488. Maximum sequence length: 2049, sample length: 3772 [default0]:Skipping sample id=2744707. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2743280. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2750196. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2721331. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2723301. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2716037. Maximum sequence length: 2049, sample length: 3302 [default0]:Skipping sample id=2747068. Maximum sequence length: 2049, sample length: 5711 [default0]:Skipping sample id=2722117. Maximum sequence length: 2049, sample length: 3004 [default0]:Skipping sample id=2734605. Maximum sequence length: 2049, sample length: 5807 [default0]:Skipping sample id=2732664. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2712789. Maximum sequence length: 2049, sample length: 4085 [default0]:Skipping sample id=2733879. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2729506. Maximum sequence length: 2049, sample length: 8224 [default0]:Skipping sample id=2751104. Maximum sequence length: 2049, sample length: 3918 [default0]:Skipping sample id=2746089. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2745068. Maximum sequence length: 2049, sample length: 4240 [default0]:Skipping sample id=2712690. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2721981. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2719600. Maximum sequence length: 2049, sample length: 3613 [default0]:Skipping sample id=2488279. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2711992. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2728542. Maximum sequence length: 2049, sample length: 7200 [default0]:Skipping sample id=2746628. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2750066. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2720996. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2750868. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2745953. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2721047. Maximum sequence length: 2049, sample length: 2486 [default0]:Skipping sample id=2722020. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2738193. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2741538. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2498934. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2499181. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2733653. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2712397. Maximum sequence length: 2049, sample length: 2490 [default0]:Skipping sample id=2746678. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2739495. Maximum sequence length: 2049, sample length: 2956 [default0]:Skipping sample id=2753910. Maximum sequence length: 2049, sample length: 3029 [default0]:Skipping sample id=2741633. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2740764. Maximum sequence length: 2049, sample length: 5429 [default0]:Skipping sample id=2732566. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2740797. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2739776. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2713701. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2733851. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2715009. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2734588. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2711945. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2721043. Maximum sequence length: 2049, sample length: 3697 [default0]:Skipping sample id=2488824. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2736076. Maximum sequence length: 2049, sample length: 2656 [default0]:Skipping sample id=2742933. Maximum sequence length: 2049, sample length: 2421 [default0]:Skipping sample id=2720920. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2718822. Maximum sequence length: 2049, sample length: 4981 [default0]:Skipping sample id=2478810. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2743361. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2728750. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2711739. Maximum sequence length: 2049, sample length: 3430 [default0]:Skipping sample id=2727762. Maximum sequence length: 2049, sample length: 3986 [default0]:Skipping sample id=2716460. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2726596. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2734260. Maximum sequence length: 2049, sample length: 3315 [default0]:Skipping sample id=2726457. Maximum sequence length: 2049, sample length: 2597 [default0]:Skipping sample id=2738071. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2740906. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2490711. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2750891. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2737021. Maximum sequence length: 2049, sample length: 2678 [default0]:Skipping sample id=2749570. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2750520. Maximum sequence length: 2049, sample length: 6487 [default0]:Skipping sample id=2744289. Maximum sequence length: 2049, sample length: 5103 [default0]:Skipping sample id=2711072. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2751330. Maximum sequence length: 2049, sample length: 4244 [default0]:Skipping sample id=2744430. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2491885. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2727477. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2739408. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2743823. Maximum sequence length: 2049, sample length: 3866 [default0]:Skipping sample id=2746768. Maximum sequence length: 2049, sample length: 3380 [default0]:Skipping sample id=2492866. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2743907. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2736864. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2715947. Maximum sequence length: 2049, sample length: 4313 [default0]:Skipping sample id=2478768. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2715778. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2736553. Maximum sequence length: 2049, sample length: 3292 [default0]:Skipping sample id=2716277. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2722913. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2729251. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2727837. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2497296. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2722424. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2745017. Maximum sequence length: 2049, sample length: 4951 [default0]:Skipping sample id=2751735. Maximum sequence length: 2049, sample length: 2948 [default0]:Skipping sample id=2740210. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2751978. Maximum sequence length: 2049, sample length: 3070 [default0]:Skipping sample id=2729449. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2489257. Maximum sequence length: 2049, sample length: 3258 [default0]:Skipping sample id=2723539. Maximum sequence length: 2049, sample length: 3920 [default0]:Skipping sample id=2734496. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2746963. Maximum sequence length: 2049, sample length: 4650 [default0]:Skipping sample id=2746734. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2725327. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2757026. Maximum sequence length: 2049, sample length: 3385 [default0]:Skipping sample id=2756421. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2726094. Maximum sequence length: 2049, sample length: 3663 [default0]:Skipping sample id=2490022. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2754783. Maximum sequence length: 2049, sample length: 5558 [default0]:Skipping sample id=2748284. Maximum sequence length: 2049, sample length: 5181 [default0]:Skipping sample id=2756885. Maximum sequence length: 2049, sample length: 3833 [default0]:Skipping sample id=2739085. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2752363. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2720931. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2743453. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2724719. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2718007. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2731464. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2753407. Maximum sequence length: 2049, sample length: 3798 [default0]:Skipping sample id=2748463. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2713720. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2720516. Maximum sequence length: 2049, sample length: 3448 [default0]:Skipping sample id=2735121. Maximum sequence length: 2049, sample length: 3074 [default0]:Skipping sample id=2752265. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2724217. Maximum sequence length: 2049, sample length: 3228 [default0]:Skipping sample id=2716776. Maximum sequence length: 2049, sample length: 2932 [default0]:Skipping sample id=2744558. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2479875. Maximum sequence length: 2049, sample length: 3528 [default0]:Skipping sample id=2482192. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2746542. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2726766. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2738699. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2494726. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2731266. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2720819. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2712173. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2711970. Maximum sequence length: 2049, sample length: 3999 [default0]:Skipping sample id=2487739. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2488982. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2741324. Maximum sequence length: 2049, sample length: 2980 [default0]:Skipping sample id=2754624. Maximum sequence length: 2049, sample length: 3149 [default0]:Skipping sample id=2720761. Maximum sequence length: 2049, sample length: 2993 [default0]:Skipping sample id=2725193. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2736916. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2725901. Maximum sequence length: 2049, sample length: 3433 [default0]:Skipping sample id=2714701. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2749184. Maximum sequence length: 2049, sample length: 3746 [default0]:Skipping sample id=2484965. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2722353. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2746053. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2732502. Maximum sequence length: 2049, sample length: 5171 [default0]:Skipping sample id=2719267. Maximum sequence length: 2049, sample length: 2676 [default0]:Skipping sample id=2716111. Maximum sequence length: 2049, sample length: 3403 [default0]:Skipping sample id=2742127. Maximum sequence length: 2049, sample length: 3793 [default0]:Skipping sample id=2725341. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2499022. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2481197. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2734310. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2719308. Maximum sequence length: 2049, sample length: 3421 [default0]:Skipping sample id=2716374. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2722407. Maximum sequence length: 2049, sample length: 5298 [default0]:Skipping sample id=2748642. Maximum sequence length: 2049, sample length: 6011 [default0]:Skipping sample id=2722497. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2740549. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2724263. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2723089. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2735680. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2717848. Maximum sequence length: 2049, sample length: 2748 [default0]:Skipping sample id=2477337. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2466283. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2733931. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2727030. Maximum sequence length: 2049, sample length: 4158 [default0]:Skipping sample id=2718584. Maximum sequence length: 2049, sample length: 5717 [default0]:Skipping sample id=2749813. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2742932. Maximum sequence length: 2049, sample length: 3696 [default0]:Skipping sample id=2717288. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2713722. Maximum sequence length: 2049, sample length: 4701 [default0]:Skipping sample id=2711502. Maximum sequence length: 2049, sample length: 2672 [default0]:Skipping sample id=2479776. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2717716. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2755964. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2755722. Maximum sequence length: 2049, sample length: 3738 [default0]:Skipping sample id=2716235. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2740680. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2752521. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2471200. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2479225. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2741725. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2721015. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2720006. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2723047. Maximum sequence length: 2049, sample length: 4092 [default0]:Skipping sample id=2494917. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2714646. Maximum sequence length: 2049, sample length: 4022 [default0]:Skipping sample id=2748232. Maximum sequence length: 2049, sample length: 2457 [default0]:Skipping sample id=2467388. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2740503. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2745297. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2728282. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2718292. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2735180. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2732685. Maximum sequence length: 2049, sample length: 3458 [default0]:Skipping sample id=2724852. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2740134. Maximum sequence length: 2049, sample length: 7506 [default0]:Skipping sample id=2478785. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2485719. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2713552. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2738760. Maximum sequence length: 2049, sample length: 3507 [default0]:Skipping sample id=2721412. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2726015. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2738840. Maximum sequence length: 2049, sample length: 4502 [default0]:Skipping sample id=2728760. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2713515. Maximum sequence length: 2049, sample length: 5258 [default0]:Skipping sample id=2717517. Maximum sequence length: 2049, sample length: 4132 [default0]:Skipping sample id=2489966. Maximum sequence length: 2049, sample length: 2234 [default0]:Skipping sample id=2716065. Maximum sequence length: 2049, sample length: 5988 [default0]:Skipping sample id=2748228. Maximum sequence length: 2049, sample length: 3676 [default0]:Skipping sample id=2734502. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2483507. Maximum sequence length: 2049, sample length: 2778 [default0]:Skipping sample id=2743690. Maximum sequence length: 2049, sample length: 3084 [default0]:Skipping sample id=2725663. Maximum sequence length: 2049, sample length: 2876 [default0]:Skipping sample id=2480116. Maximum sequence length: 2049, sample length: 3331 [default0]:Skipping sample id=2752308. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2728408. Maximum sequence length: 2049, sample length: 6399 [default0]:Skipping sample id=2747372. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2730001. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2719969. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2741802. Maximum sequence length: 2049, sample length: 2888 [default0]:Skipping sample id=2745253. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2724457. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2728622. Maximum sequence length: 2049, sample length: 4431 [default0]:Skipping sample id=2715654. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2756184. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2481810. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2738778. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2713314. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2724688. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2730816. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2721149. Maximum sequence length: 2049, sample length: 4382 [default0]:Skipping sample id=2745404. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2498635. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2715608. Maximum sequence length: 2049, sample length: 3560 [default0]:Skipping sample id=2755403. Maximum sequence length: 2049, sample length: 2837 [default0]:Skipping sample id=2740345. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2745148. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2725223. Maximum sequence length: 2049, sample length: 2945 [default0]:Skipping sample id=2740048. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2712682. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2736478. Maximum sequence length: 2049, sample length: 3239 [default0]:Skipping sample id=2740714. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2711119. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2731133. Maximum sequence length: 2049, sample length: 3218 [default0]:Skipping sample id=2478134. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2739428. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2733982. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2735900. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2731574. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2739583. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2497199. Maximum sequence length: 2049, sample length: 3320 [default0]:Skipping sample id=2750052. Maximum sequence length: 2049, sample length: 3709 [default0]:Skipping sample id=2738080. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2721357. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2726718. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2753832. Maximum sequence length: 2049, sample length: 3090 [default0]:Skipping sample id=2738582. Maximum sequence length: 2049, sample length: 3142 [default0]:Skipping sample id=2754203. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2743635. Maximum sequence length: 2049, sample length: 3018 [default0]:Skipping sample id=2749697. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2740125. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2733592. Maximum sequence length: 2049, sample length: 4858 [default0]:Skipping sample id=2498889. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2718099. Maximum sequence length: 2049, sample length: 4938 [default0]:Skipping sample id=2712743. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2752227. Maximum sequence length: 2049, sample length: 4352 [default0]:Skipping sample id=2492943. Maximum sequence length: 2049, sample length: 2756 [default0]:Skipping sample id=2742603. Maximum sequence length: 2049, sample length: 2939 [default0]:Skipping sample id=2755997. Maximum sequence length: 2049, sample length: 2844 [default0]:Skipping sample id=2485270. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2756689. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2721189. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2737118. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2723452. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2732094. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2744780. Maximum sequence length: 2049, sample length: 4003 [default0]:Skipping sample id=2716932. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2469098. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2736197. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2719169. Maximum sequence length: 2049, sample length: 3715 [default0]:Skipping sample id=2739846. Maximum sequence length: 2049, sample length: 4599 [default0]:Skipping sample id=2750253. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2717778. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2726613. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2487506. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2488716. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2744845. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2727134. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2742349. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2730348. Maximum sequence length: 2049, sample length: 4822 [default0]:Skipping sample id=2751802. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2736183. Maximum sequence length: 2049, sample length: 4218 [default0]:Skipping sample id=2752847. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2729195. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2492053. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2729486. Maximum sequence length: 2049, sample length: 3020 [default0]:Skipping sample id=2713980. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2713533. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2743822. Maximum sequence length: 2049, sample length: 3027 [default0]:Skipping sample id=2477376. Maximum sequence length: 2049, sample length: 3162 [default0]:Skipping sample id=2478512. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2735575. Maximum sequence length: 2049, sample length: 5077 [default0]:Skipping sample id=2718448. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2750488. Maximum sequence length: 2049, sample length: 2974 [default0]:Skipping sample id=2718851. Maximum sequence length: 2049, sample length: 4701 [default0]:Skipping sample id=2730566. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2743363. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2482802. Maximum sequence length: 2049, sample length: 3601 [default0]:Skipping sample id=2754777. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2747339. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2747585. Maximum sequence length: 2049, sample length: 3471 [default0]:Skipping sample id=2736810. Maximum sequence length: 2049, sample length: 5703 [default0]:Skipping sample id=2745823. Maximum sequence length: 2049, sample length: 3655 [default0]:Skipping sample id=2470376. Maximum sequence length: 2049, sample length: 2400 [default0]:Skipping sample id=2750179. Maximum sequence length: 2049, sample length: 3081 [default0]:Skipping sample id=2753626. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2490639. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2753853. Maximum sequence length: 2049, sample length: 3898 [default0]:Skipping sample id=2729170. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2712068. Maximum sequence length: 2049, sample length: 4048 [default0]:Skipping sample id=2736084. Maximum sequence length: 2049, sample length: 4164 [default0]:Skipping sample id=2711127. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2726835. Maximum sequence length: 2049, sample length: 2506 [default0]:Skipping sample id=2727571. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2735897. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2485437. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2715832. Maximum sequence length: 2049, sample length: 3019 [default0]:Skipping sample id=2470059. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2737493. Maximum sequence length: 2049, sample length: 4221 [default0]:Skipping sample id=2466351. Maximum sequence length: 2049, sample length: 2899 [default0]:Skipping sample id=2738219. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2753821. Maximum sequence length: 2049, sample length: 3941 [default0]:Skipping sample id=2716595. Maximum sequence length: 2049, sample length: 5789 [default0]:Skipping sample id=2743571. Maximum sequence length: 2049, sample length: 4768 [default0]:Skipping sample id=2730702. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2711005. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2730551. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2753660. Maximum sequence length: 2049, sample length: 5158 [default0]:Skipping sample id=2744895. Maximum sequence length: 2049, sample length: 2521 [default0]:Skipping sample id=2737965. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2737448. Maximum sequence length: 2049, sample length: 3074 [default0]:Skipping sample id=2721738. Maximum sequence length: 2049, sample length: 4266 [default0]:Skipping sample id=2753053. Maximum sequence length: 2049, sample length: 3552 [default0]:Skipping sample id=2754502. Maximum sequence length: 2049, sample length: 6058 [default0]:Skipping sample id=2755212. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2477739. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2714373. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2753252. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2742356. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2728455. Maximum sequence length: 2049, sample length: 3980 [default0]:Skipping sample id=2755203. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2752624. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2747410. Maximum sequence length: 2049, sample length: 4604 [default0]:Skipping sample id=2716187. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2741997. Maximum sequence length: 2049, sample length: 2531 [default0]:Skipping sample id=2723804. Maximum sequence length: 2049, sample length: 3044 [default0]:Skipping sample id=2736394. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2489558. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2722847. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2755617. Maximum sequence length: 2049, sample length: 2973 [default0]:Skipping sample id=2492621. Maximum sequence length: 2049, sample length: 4275 [default0]:Skipping sample id=2736294. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2727138. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2739939. Maximum sequence length: 2049, sample length: 4240 [default0]:Skipping sample id=2734956. Maximum sequence length: 2049, sample length: 6853 [default0]:Skipping sample id=2729165. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2753506. Maximum sequence length: 2049, sample length: 2740 [default0]:Skipping sample id=2723670. Maximum sequence length: 2049, sample length: 3033 [default0]:Skipping sample id=2756800. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2757006. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2495439. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2746191. Maximum sequence length: 2049, sample length: 4551 [default0]:Skipping sample id=2749961. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2729720. Maximum sequence length: 2049, sample length: 3326 [default0]:Skipping sample id=2745607. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2477526. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2733905. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2723527. Maximum sequence length: 2049, sample length: 3488 [default0]:Skipping sample id=2730245. Maximum sequence length: 2049, sample length: 3088 [default0]:Skipping sample id=2739270. Maximum sequence length: 2049, sample length: 3718 [default0]:Skipping sample id=2713144. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2495378. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2753327. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2479450. Maximum sequence length: 2049, sample length: 4323 [default0]:Skipping sample id=2729912. Maximum sequence length: 2049, sample length: 3872 [default0]:Skipping sample id=2729381. Maximum sequence length: 2049, sample length: 8496 [default0]:Skipping sample id=2730645. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2744945. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2736720. Maximum sequence length: 2049, sample length: 4545 [default0]:Skipping sample id=2721168. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2739576. Maximum sequence length: 2049, sample length: 3463 [default0]:Skipping sample id=2744268. Maximum sequence length: 2049, sample length: 3255 [default0]:Skipping sample id=2729801. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2484383. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2481377. Maximum sequence length: 2049, sample length: 4280 [default0]:Skipping sample id=2720393. Maximum sequence length: 2049, sample length: 3145 [default0]:Skipping sample id=2724684. Maximum sequence length: 2049, sample length: 3538 [default0]:Skipping sample id=2467681. Maximum sequence length: 2049, sample length: 3410 [default0]:Skipping sample id=2468179. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2748822. Maximum sequence length: 2049, sample length: 3650 [default0]:Skipping sample id=2718430. Maximum sequence length: 2049, sample length: 2934 [default0]:Skipping sample id=2712805. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2714885. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2754277. Maximum sequence length: 2049, sample length: 3475 [default0]:Skipping sample id=2743850. Maximum sequence length: 2049, sample length: 4809 [default0]:Skipping sample id=2737385. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2730336. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2746805. Maximum sequence length: 2049, sample length: 4134 [default0]:Skipping sample id=2735297. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2744939. Maximum sequence length: 2049, sample length: 5167 [default0]:Skipping sample id=2490901. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2725682. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2744894. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2494182. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2755090. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2717579. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2718931. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2714767. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2467014. Maximum sequence length: 2049, sample length: 3593 [default0]:Skipping sample id=2466208. Maximum sequence length: 2049, sample length: 3519 [default0]:Skipping sample id=2714833. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2713162. Maximum sequence length: 2049, sample length: 2676 [default0]:Skipping sample id=2742977. Maximum sequence length: 2049, sample length: 5801 [default0]:Skipping sample id=2477307. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2731426. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2727980. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2728101. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2715203. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2728707. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2748291. Maximum sequence length: 2049, sample length: 3947 [default0]:Skipping sample id=2753751. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2741154. Maximum sequence length: 2049, sample length: 5046 [default0]:Skipping sample id=2742583. Maximum sequence length: 2049, sample length: 3414 [default0]:Skipping sample id=2722481. Maximum sequence length: 2049, sample length: 5465 [default0]:Skipping sample id=2740371. Maximum sequence length: 2049, sample length: 4225 [default0]:Skipping sample id=2745711. Maximum sequence length: 2049, sample length: 3464 [default0]:Skipping sample id=2741573. Maximum sequence length: 2049, sample length: 4385 [default0]:Skipping sample id=2728860. Maximum sequence length: 2049, sample length: 3327 [default0]:Skipping sample id=2471128. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2732765. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2716112. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2733242. Maximum sequence length: 2049, sample length: 6218 [default0]:Skipping sample id=2746391. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2493536. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2470134. Maximum sequence length: 2049, sample length: 4284 [default0]:Skipping sample id=2467722. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2470594. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2739327. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2482366. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2731219. Maximum sequence length: 2049, sample length: 3009 [default0]:Skipping sample id=2736663. Maximum sequence length: 2049, sample length: 4838 [default0]:Skipping sample id=2483212. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2746161. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2731061. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2735454. Maximum sequence length: 2049, sample length: 2910 [default0]:Skipping sample id=2490209. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2495627. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2486902. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2471296. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2487430. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2726893. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2728857. Maximum sequence length: 2049, sample length: 6141 [default0]:Skipping sample id=2721216. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2722213. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2713972. Maximum sequence length: 2049, sample length: 3782 [default0]:Skipping sample id=2753213. Maximum sequence length: 2049, sample length: 3137 [default0]:Skipping sample id=2729918. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2743375. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2739237. Maximum sequence length: 2049, sample length: 3143 [default0]:Skipping sample id=2749453. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2744953. Maximum sequence length: 2049, sample length: 3357 [default0]:Skipping sample id=2716379. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2718252. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2726083. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2728939. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2751550. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2739104. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2749422. Maximum sequence length: 2049, sample length: 3931 [default0]:Skipping sample id=2712666. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2713600. Maximum sequence length: 2049, sample length: 4761 [default0]:Skipping sample id=2751297. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2716049. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2731725. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2726086. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2711402. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2468067. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2752180. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2485800. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2727373. Maximum sequence length: 2049, sample length: 5171 [default0]:Skipping sample id=2742064. Maximum sequence length: 2049, sample length: 2677 [default0]:Skipping sample id=2753979. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2730871. Maximum sequence length: 2049, sample length: 5383 [default0]:Skipping sample id=2733123. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2722610. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2724516. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2728881. Maximum sequence length: 2049, sample length: 3384 [default0]:Skipping sample id=2712733. Maximum sequence length: 2049, sample length: 3185 [default0]:Skipping sample id=2487155. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2496046. Maximum sequence length: 2049, sample length: 3014 [default0]:Skipping sample id=2751944. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2753759. Maximum sequence length: 2049, sample length: 2687 [default0]:Skipping sample id=2742744. Maximum sequence length: 2049, sample length: 3917 [default0]:Skipping sample id=2751582. Maximum sequence length: 2049, sample length: 2988 [default0]:Skipping sample id=2715729. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2736153. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2719719. Maximum sequence length: 2049, sample length: 3450 [default0]:Skipping sample id=2715889. Maximum sequence length: 2049, sample length: 3877 [default0]:Skipping sample id=2493687. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2728733. Maximum sequence length: 2049, sample length: 4352 [default0]:Skipping sample id=2720814. Maximum sequence length: 2049, sample length: 6345 [default0]:Skipping sample id=2732863. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2736113. Maximum sequence length: 2049, sample length: 5247 [default0]:Skipping sample id=2724518. Maximum sequence length: 2049, sample length: 3501 [default0]:Skipping sample id=2728145. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2722275. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2731156. Maximum sequence length: 2049, sample length: 4352 [default0]:Skipping sample id=2751625. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2736196. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2722406. Maximum sequence length: 2049, sample length: 2837 [default0]:Skipping sample id=2732305. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2754121. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2734786. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2713470. Maximum sequence length: 2049, sample length: 6151 [default0]:Skipping sample id=2728511. Maximum sequence length: 2049, sample length: 4361 [default0]:Skipping sample id=2719236. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2721857. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2735694. Maximum sequence length: 2049, sample length: 5980 [default0]:Skipping sample id=2737131. Maximum sequence length: 2049, sample length: 5012 [default0]:Skipping sample id=2740235. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2466041. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2467024. Maximum sequence length: 2049, sample length: 2844 [default0]:Skipping sample id=2477165. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2734853. Maximum sequence length: 2049, sample length: 5544 [default0]:Skipping sample id=2722228. Maximum sequence length: 2049, sample length: 4006 [default0]:Skipping sample id=2744744. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2469796. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2722366. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2745400. Maximum sequence length: 2049, sample length: 2995 [default0]:Skipping sample id=2485038. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2718833. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2466899. Maximum sequence length: 2049, sample length: 2695 [default0]:Skipping sample id=2714296. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2728882. Maximum sequence length: 2049, sample length: 3749 [default0]:Skipping sample id=2748893. Maximum sequence length: 2049, sample length: 2970 [default0]:Skipping sample id=2754982. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2750411. Maximum sequence length: 2049, sample length: 4296 [default0]:Skipping sample id=2727249. Maximum sequence length: 2049, sample length: 3603 [default0]:Skipping sample id=2753936. Maximum sequence length: 2049, sample length: 3956 [default0]:Skipping sample id=2741644. Maximum sequence length: 2049, sample length: 3552 [default0]:Skipping sample id=2725478. Maximum sequence length: 2049, sample length: 5971 [default0]:Skipping sample id=2726090. Maximum sequence length: 2049, sample length: 3026 [default0]:Skipping sample id=2716811. Maximum sequence length: 2049, sample length: 3734 [default0]:Skipping sample id=2735065. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2716044. Maximum sequence length: 2049, sample length: 3573 [default0]:Skipping sample id=2494808. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2718135. Maximum sequence length: 2049, sample length: 5042 [default0]:Skipping sample id=2727506. Maximum sequence length: 2049, sample length: 2749 [default0]:Skipping sample id=2726663. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2722096. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2754795. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2718803. Maximum sequence length: 2049, sample length: 6810 [default0]:Skipping sample id=2747023. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2727556. Maximum sequence length: 2049, sample length: 3458 [default0]:Skipping sample id=2737881. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2488319. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2742769. Maximum sequence length: 2049, sample length: 3605 [default0]:Skipping sample id=2756458. Maximum sequence length: 2049, sample length: 3997 [default0]:Skipping sample id=2736971. Maximum sequence length: 2049, sample length: 2515 [default0]:Skipping sample id=2481707. Maximum sequence length: 2049, sample length: 3113 [default0]:Skipping sample id=2742886. Maximum sequence length: 2049, sample length: 4901 [default0]:Skipping sample id=2716182. Maximum sequence length: 2049, sample length: 3778 [default0]:Skipping sample id=2719090. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2742435. Maximum sequence length: 2049, sample length: 3220 [default0]:Skipping sample id=2725132. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2719710. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2712159. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2715315. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2713859. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2724551. Maximum sequence length: 2049, sample length: 2876 [default0]:Skipping sample id=2734209. Maximum sequence length: 2049, sample length: 4971 [default0]:Skipping sample id=2715006. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2493519. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2731568. Maximum sequence length: 2049, sample length: 3080 [default0]:Skipping sample id=2717730. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2490519. Maximum sequence length: 2049, sample length: 2545 [default0]:Skipping sample id=2756950. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2712765. Maximum sequence length: 2049, sample length: 3637 [default0]:Skipping sample id=2713584. Maximum sequence length: 2049, sample length: 3447 [default0]:Skipping sample id=2736632. Maximum sequence length: 2049, sample length: 4112 [default0]:Skipping sample id=2734088. Maximum sequence length: 2049, sample length: 3397 [default0]:Skipping sample id=2756864. Maximum sequence length: 2049, sample length: 2947 [default0]:Skipping sample id=2494507. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2720820. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2738466. Maximum sequence length: 2049, sample length: 3892 [default0]:Skipping sample id=2755610. Maximum sequence length: 2049, sample length: 3666 [default0]:Skipping sample id=2754712. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2756743. Maximum sequence length: 2049, sample length: 3942 [default0]:Skipping sample id=2722226. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2725653. Maximum sequence length: 2049, sample length: 5549 [default0]:Skipping sample id=2718661. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2745323. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2750564. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2714531. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2717190. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2497522. Maximum sequence length: 2049, sample length: 3143 [default0]:Skipping sample id=2747460. Maximum sequence length: 2049, sample length: 4021 [default0]:Skipping sample id=2749369. Maximum sequence length: 2049, sample length: 3608 [default0]:Skipping sample id=2749475. Maximum sequence length: 2049, sample length: 3320 [default0]:Skipping sample id=2735210. Maximum sequence length: 2049, sample length: 3293 [default0]:Skipping sample id=2755038. Maximum sequence length: 2049, sample length: 4008 [default0]:Skipping sample id=2466143. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2727566. Maximum sequence length: 2049, sample length: 2781 [default0]:Skipping sample id=2712087. Maximum sequence length: 2049, sample length: 4332 [default0]:Skipping sample id=2744293. Maximum sequence length: 2049, sample length: 3924 [default0]:Skipping sample id=2739252. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2735292. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2750023. Maximum sequence length: 2049, sample length: 3245 [default0]:Skipping sample id=2714258. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2745780. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2749970. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2471165. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2741621. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2743839. Maximum sequence length: 2049, sample length: 2798 [default0]:Skipping sample id=2721919. Maximum sequence length: 2049, sample length: 3285 [default0]:Skipping sample id=2749495. Maximum sequence length: 2049, sample length: 3151 [default0]:Skipping sample id=2714394. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2499110. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2489010. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2727935. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2734952. Maximum sequence length: 2049, sample length: 4709 [default0]:Skipping sample id=2483879. Maximum sequence length: 2049, sample length: 3594 [default0]:Skipping sample id=2738436. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2745746. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2732735. Maximum sequence length: 2049, sample length: 2991 [default0]:Skipping sample id=2737857. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2722250. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2742029. Maximum sequence length: 2049, sample length: 4860 [default0]:Skipping sample id=2755613. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2482612. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2741438. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2498639. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2719775. Maximum sequence length: 2049, sample length: 4022 [default0]:Skipping sample id=2726554. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2755909. Maximum sequence length: 2049, sample length: 2602 [default0]:Skipping sample id=2736280. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2712931. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2729231. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2753243. Maximum sequence length: 2049, sample length: 5177 [default0]:Skipping sample id=2726502. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2484823. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2749099. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2721213. Maximum sequence length: 2049, sample length: 2949 [default0]:Skipping sample id=2471268. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2753548. Maximum sequence length: 2049, sample length: 2707 [default0]:Skipping sample id=2719379. Maximum sequence length: 2049, sample length: 2573 [default0]:Skipping sample id=2738880. Maximum sequence length: 2049, sample length: 4186 [default0]:Skipping sample id=2729162. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2715556. Maximum sequence length: 2049, sample length: 3439 [default0]:Skipping sample id=2483993. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2731390. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2746777. Maximum sequence length: 2049, sample length: 3150 [default0]:Skipping sample id=2728646. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2739998. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2711276. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2720186. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2732208. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2744468. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2752534. Maximum sequence length: 2049, sample length: 2440 [default0]:Skipping sample id=2493153. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2477818. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2498855. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2716856. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2492901. Maximum sequence length: 2049, sample length: 3027 [default0]:Skipping sample id=2736723. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2736055. Maximum sequence length: 2049, sample length: 3749 [default0]:Skipping sample id=2744244. Maximum sequence length: 2049, sample length: 6055 [default0]:Skipping sample id=2724108. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2712806. Maximum sequence length: 2049, sample length: 4122 [default0]:Skipping sample id=2713835. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2744846. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2724948. Maximum sequence length: 2049, sample length: 3875 [default0]:Skipping sample id=2722112. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2716689. Maximum sequence length: 2049, sample length: 4967 [default0]:Skipping sample id=2743456. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2739195. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2729458. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2725960. Maximum sequence length: 2049, sample length: 3566 [default0]:Skipping sample id=2718464. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2714810. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2727768. Maximum sequence length: 2049, sample length: 5322 [default0]:Skipping sample id=2746124. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2756149. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2715839. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2747290. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2726345. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2739192. Maximum sequence length: 2049, sample length: 3175 [default0]:Skipping sample id=2732563. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2752514. Maximum sequence length: 2049, sample length: 3303 [default0]:Skipping sample id=2754312. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2471001. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2725267. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2733789. Maximum sequence length: 2049, sample length: 4085 [default0]:Skipping sample id=2729041. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2753859. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2742489. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2728325. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2722694. Maximum sequence length: 2049, sample length: 4555 [default0]:Skipping sample id=2495805. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2712112. Maximum sequence length: 2049, sample length: 5143 [default0]:Skipping sample id=2487891. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2481185. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2735425. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2498614. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2467367. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2485723. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2725811. Maximum sequence length: 2049, sample length: 3571 [default0]:Skipping sample id=2712052. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2725607. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2738002. Maximum sequence length: 2049, sample length: 3118 [default0]:Skipping sample id=2753399. Maximum sequence length: 2049, sample length: 2902 [default0]:Skipping sample id=2488065. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2732231. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2729010. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2712722. Maximum sequence length: 2049, sample length: 3323 [default0]:Skipping sample id=2725550. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2478600. Maximum sequence length: 2049, sample length: 3616 [default0]:Skipping sample id=2738029. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2735827. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2716893. Maximum sequence length: 2049, sample length: 2678 [default0]:Skipping sample id=2729975. Maximum sequence length: 2049, sample length: 3332 [default0]:Skipping sample id=2736067. Maximum sequence length: 2049, sample length: 2777 [default0]:Skipping sample id=2744271. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2731238. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2751542. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2749974. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2748730. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2722520. Maximum sequence length: 2049, sample length: 4268 [default0]:Skipping sample id=2494333. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2484060. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2718051. Maximum sequence length: 2049, sample length: 4183 [default0]:Skipping sample id=2484021. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2729150. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2723144. Maximum sequence length: 2049, sample length: 3753 [default0]:Skipping sample id=2721485. Maximum sequence length: 2049, sample length: 3847 [default0]:Skipping sample id=2731420. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2725917. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2746693. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2731575. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2747783. Maximum sequence length: 2049, sample length: 3511 [default0]:Skipping sample id=2725201. Maximum sequence length: 2049, sample length: 3515 [default0]:Skipping sample id=2716553. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2734041. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2712005. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2498927. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2732458. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2750442. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2725103. Maximum sequence length: 2049, sample length: 3829 [default0]:Skipping sample id=2738762. Maximum sequence length: 2049, sample length: 6433 [default0]:Skipping sample id=2726460. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2715976. Maximum sequence length: 2049, sample length: 2400 [default0]:Skipping sample id=2749784. Maximum sequence length: 2049, sample length: 2489 [default0]:Skipping sample id=2745888. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2717879. Maximum sequence length: 2049, sample length: 3173 [default0]:Skipping sample id=2752480. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2715936. Maximum sequence length: 2049, sample length: 4562 [default0]:Skipping sample id=2499138. Maximum sequence length: 2049, sample length: 2908 [default0]:Skipping sample id=2723959. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2746064. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2724542. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2732588. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2723490. Maximum sequence length: 2049, sample length: 2912 [default0]:Skipping sample id=2755240. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2733087. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2727974. Maximum sequence length: 2049, sample length: 2470 [default0]:Skipping sample id=2735576. Maximum sequence length: 2049, sample length: 2531 [default0]:Skipping sample id=2715349. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2712066. Maximum sequence length: 2049, sample length: 6438 [default0]:Skipping sample id=2716004. Maximum sequence length: 2049, sample length: 4214 [default0]:Skipping sample id=2727623. Maximum sequence length: 2049, sample length: 3588 [default0]:Skipping sample id=2468761. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2494349. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2736578. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2724237. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2725529. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2739906. Maximum sequence length: 2049, sample length: 2972 [default0]:Skipping sample id=2495634. Maximum sequence length: 2049, sample length: 2669 [default0]:Skipping sample id=2732321. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2718530. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2750986. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2466808. Maximum sequence length: 2049, sample length: 2568 [default0]:Skipping sample id=2754781. Maximum sequence length: 2049, sample length: 3386 [default0]:Skipping sample id=2732503. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2737499. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2720194. Maximum sequence length: 2049, sample length: 3969 [default0]:Skipping sample id=2726686. Maximum sequence length: 2049, sample length: 2958 [default0]:Skipping sample id=2711599. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2722439. Maximum sequence length: 2049, sample length: 3221 [default0]:Skipping sample id=2732548. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2741625. Maximum sequence length: 2049, sample length: 2717 [default0]:Skipping sample id=2716988. Maximum sequence length: 2049, sample length: 2997 [default0]:Skipping sample id=2756740. Maximum sequence length: 2049, sample length: 2983 [default0]:Skipping sample id=2711643. Maximum sequence length: 2049, sample length: 4168 [default0]:Skipping sample id=2716844. Maximum sequence length: 2049, sample length: 3243 [default0]:Skipping sample id=2747864. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2487183. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2754771. Maximum sequence length: 2049, sample length: 4000 [default0]:Skipping sample id=2738663. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2726979. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2757094. Maximum sequence length: 2049, sample length: 3082 [default0]:Skipping sample id=2739523. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2746263. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2726532. Maximum sequence length: 2049, sample length: 5192 [default0]:Skipping sample id=2722287. Maximum sequence length: 2049, sample length: 3412 [default0]:Skipping sample id=2740521. Maximum sequence length: 2049, sample length: 4242 [default0]:Skipping sample id=2729494. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2733835. Maximum sequence length: 2049, sample length: 2120 [default0]:Skipping sample id=2491761. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2739554. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2744886. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2731631. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2754965. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2747353. Maximum sequence length: 2049, sample length: 3026 [default0]:Skipping sample id=2749144. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2722922. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2748585. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2467587. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2719829. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2726483. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2723205. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2721491. Maximum sequence length: 2049, sample length: 2758 [default0]:Skipping sample id=2752351. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2499274. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2734086. Maximum sequence length: 2049, sample length: 3968 [default0]:Skipping sample id=2741756. Maximum sequence length: 2049, sample length: 3316 [default0]:Skipping sample id=2754917. Maximum sequence length: 2049, sample length: 4017 [default0]:Skipping sample id=2746113. Maximum sequence length: 2049, sample length: 2961 [default0]:Skipping sample id=2479824. Maximum sequence length: 2049, sample length: 2758 [default0]:Skipping sample id=2717327. Maximum sequence length: 2049, sample length: 2934 [default0]:Skipping sample id=2733249. Maximum sequence length: 2049, sample length: 3175 [default0]:Skipping sample id=2482532. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2477451. Maximum sequence length: 2049, sample length: 3515 [default0]:Skipping sample id=2740432. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2726307. Maximum sequence length: 2049, sample length: 2924 [default0]:Skipping sample id=2739254. Maximum sequence length: 2049, sample length: 3265 [default0]:Skipping sample id=2751969. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2465841. Maximum sequence length: 2049, sample length: 3594 [default0]:Skipping sample id=2724962. Maximum sequence length: 2049, sample length: 3963 [default0]:Skipping sample id=2726624. Maximum sequence length: 2049, sample length: 3287 [default0]:Skipping sample id=2751255. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2757085. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2479799. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2714563. Maximum sequence length: 2049, sample length: 3087 [default0]:Skipping sample id=2492139. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2725890. Maximum sequence length: 2049, sample length: 3000 [default0]:Skipping sample id=2728799. Maximum sequence length: 2049, sample length: 3016 [default0]:Skipping sample id=2711964. Maximum sequence length: 2049, sample length: 4160 [default0]:Skipping sample id=2713406. Maximum sequence length: 2049, sample length: 2997 [default0]:Skipping sample id=2725766. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2498463. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2729268. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2711968. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2729748. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2732163. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2722470. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2739547. Maximum sequence length: 2049, sample length: 2816 [default0]:Skipping sample id=2480040. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2469232. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2728222. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2752177. Maximum sequence length: 2049, sample length: 5051 [default0]:Skipping sample id=2490325. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2754616. Maximum sequence length: 2049, sample length: 3340 [default0]:Skipping sample id=2746610. Maximum sequence length: 2049, sample length: 3539 [default0]:Skipping sample id=2718286. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2726309. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2751822. Maximum sequence length: 2049, sample length: 2870 [default0]:Skipping sample id=2718800. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2720957. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2743085. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2713794. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2728152. Maximum sequence length: 2049, sample length: 3627 [default0]:Skipping sample id=2739150. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2487844. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2728351. Maximum sequence length: 2049, sample length: 3813 [default0]:Skipping sample id=2744280. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2739752. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2493848. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2728557. Maximum sequence length: 2049, sample length: 2649 [default0]:Skipping sample id=2490963. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2728309. Maximum sequence length: 2049, sample length: 4173 [default0]:Skipping sample id=2735723. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2752117. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2722681. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2484232. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2722326. Maximum sequence length: 2049, sample length: 4055 [default0]:Skipping sample id=2466617. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2725894. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2741022. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2734380. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2751786. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2716455. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2711207. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2742955. Maximum sequence length: 2049, sample length: 3132 [default0]:Skipping sample id=2740759. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2733282. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2498399. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2715607. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2753378. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2494150. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2738102. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2490175. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2740935. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2724133. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2723698. Maximum sequence length: 2049, sample length: 2783 [default0]:Skipping sample id=2712821. Maximum sequence length: 2049, sample length: 4014 [default0]:Skipping sample id=2741055. Maximum sequence length: 2049, sample length: 2778 [default0]:Skipping sample id=2467339. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2752663. Maximum sequence length: 2049, sample length: 4135 [default0]:Skipping sample id=2724411. Maximum sequence length: 2049, sample length: 2953 [default0]:Skipping sample id=2745141. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2716933. Maximum sequence length: 2049, sample length: 4166 [default0]:Skipping sample id=2471192. Maximum sequence length: 2049, sample length: 2910 [default0]:Skipping sample id=2711938. Maximum sequence length: 2049, sample length: 2900 [default0]:Skipping sample id=2730074. Maximum sequence length: 2049, sample length: 3240 [default0]:Skipping sample id=2754936. Maximum sequence length: 2049, sample length: 2826 [default0]:Skipping sample id=2743013. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2750780. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2738926. Maximum sequence length: 2049, sample length: 3737 [default0]:Skipping sample id=2745665. Maximum sequence length: 2049, sample length: 4574 [default0]:Skipping sample id=2723250. Maximum sequence length: 2049, sample length: 5388 [default0]:Skipping sample id=2740243. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2755681. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2482473. Maximum sequence length: 2049, sample length: 3100 [default0]:Skipping sample id=2756316. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2741125. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2755462. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2717537. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2713070. Maximum sequence length: 2049, sample length: 3364 [default0]:Skipping sample id=2741092. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2756602. Maximum sequence length: 2049, sample length: 3813 [default0]:Skipping sample id=2735304. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2727547. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2481279. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2724064. Maximum sequence length: 2049, sample length: 3177 [default0]:Skipping sample id=2745689. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2477262. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2741969. Maximum sequence length: 2049, sample length: 4347 [default0]:Skipping sample id=2742996. Maximum sequence length: 2049, sample length: 3107 [default0]:Skipping sample id=2717498. Maximum sequence length: 2049, sample length: 3211 [default0]:Skipping sample id=2736673. Maximum sequence length: 2049, sample length: 2886 [default0]:Skipping sample id=2729255. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2727117. Maximum sequence length: 2049, sample length: 5302 [default0]:Skipping sample id=2725281. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2467426. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2714792. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2744683. Maximum sequence length: 2049, sample length: 3456 [default0]:Skipping sample id=2750511. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2486696. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2748187. Maximum sequence length: 2049, sample length: 2642 [default0]:Skipping sample id=2737462. Maximum sequence length: 2049, sample length: 3745 [default0]:Skipping sample id=2743798. Maximum sequence length: 2049, sample length: 2590 [default0]:Skipping sample id=2487993. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2727374. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2736285. Maximum sequence length: 2049, sample length: 4352 [default0]:Skipping sample id=2717704. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2748292. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2718093. Maximum sequence length: 2049, sample length: 3823 [default0]:Skipping sample id=2741790. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2749186. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2750892. Maximum sequence length: 2049, sample length: 3587 [default0]:Skipping sample id=2715059. Maximum sequence length: 2049, sample length: 5087 [default0]:Skipping sample id=2713961. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2489129. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2480004. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2746762. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2740588. Maximum sequence length: 2049, sample length: 3350 [default0]:Skipping sample id=2714938. Maximum sequence length: 2049, sample length: 2920 [default0]:Skipping sample id=2743388. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2493180. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2747893. Maximum sequence length: 2049, sample length: 3134 [default0]:Skipping sample id=2717258. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2734566. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2718337. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2739080. Maximum sequence length: 2049, sample length: 3124 [default0]:Skipping sample id=2726122. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2734577. Maximum sequence length: 2049, sample length: 4026 [default0]:Skipping sample id=2471337. Maximum sequence length: 2049, sample length: 3395 [default0]:Skipping sample id=2756541. Maximum sequence length: 2049, sample length: 5052 [default0]:Skipping sample id=2734609. Maximum sequence length: 2049, sample length: 3380 [default0]:Skipping sample id=2718570. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2741773. Maximum sequence length: 2049, sample length: 4262 [default0]:Skipping sample id=2733076. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2468149. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2471341. Maximum sequence length: 2049, sample length: 3120 [default0]:Skipping sample id=2736881. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2746874. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2723157. Maximum sequence length: 2049, sample length: 4333 [default0]:Skipping sample id=2719835. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2756362. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2739545. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2729855. Maximum sequence length: 2049, sample length: 3641 [default0]:Skipping sample id=2480105. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2749717. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2720173. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2483364. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2741905. Maximum sequence length: 2049, sample length: 4671 [default0]:Skipping sample id=2486066. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2486084. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2736644. Maximum sequence length: 2049, sample length: 4261 [default0]:Skipping sample id=2477885. Maximum sequence length: 2049, sample length: 2773 [default0]:Skipping sample id=2729767. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2755224. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2716121. Maximum sequence length: 2049, sample length: 2841 [default0]:Skipping sample id=2718918. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2716356. Maximum sequence length: 2049, sample length: 3840 [default0]:Skipping sample id=2752589. Maximum sequence length: 2049, sample length: 4018 [default0]:Skipping sample id=2719813. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2736819. Maximum sequence length: 2049, sample length: 3518 [default0]:Skipping sample id=2723543. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2752060. Maximum sequence length: 2049, sample length: 2894 [default0]:Skipping sample id=2717237. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2711754. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2744753. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2750750. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2749851. Maximum sequence length: 2049, sample length: 3637 [default0]:Skipping sample id=2749027. Maximum sequence length: 2049, sample length: 4109 [default0]:Skipping sample id=2748234. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2751176. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2718965. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2742528. Maximum sequence length: 2049, sample length: 2597 [default0]:Skipping sample id=2728987. Maximum sequence length: 2049, sample length: 4751 [default0]:Skipping sample id=2753098. Maximum sequence length: 2049, sample length: 6262 [default0]:Skipping sample id=2716324. Maximum sequence length: 2049, sample length: 4638 [default0]:Skipping sample id=2735465. Maximum sequence length: 2049, sample length: 3110 [default0]:Skipping sample id=2726534. Maximum sequence length: 2049, sample length: 3921 [default0]:Skipping sample id=2495928. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2492517. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2754626. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2735392. Maximum sequence length: 2049, sample length: 4312 [default0]:Skipping sample id=2754190. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2488328. Maximum sequence length: 2049, sample length: 3584 [default0]:Skipping sample id=2731103. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2712147. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2744538. Maximum sequence length: 2049, sample length: 5610 [default0]:Skipping sample id=2752262. Maximum sequence length: 2049, sample length: 3010 [default0]:Skipping sample id=2741654. Maximum sequence length: 2049, sample length: 4753 [default0]:Skipping sample id=2717524. Maximum sequence length: 2049, sample length: 3284 [default0]:Skipping sample id=2730829. Maximum sequence length: 2049, sample length: 3569 [default0]:Skipping sample id=2717564. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2713384. Maximum sequence length: 2049, sample length: 4473 [default0]:Skipping sample id=2737102. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2714017. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2747745. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2740339. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2715023. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2741566. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2741048. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2755233. Maximum sequence length: 2049, sample length: 4605 [default0]:Skipping sample id=2720886. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2738561. Maximum sequence length: 2049, sample length: 3083 [default0]:Skipping sample id=2490554. Maximum sequence length: 2049, sample length: 3838 [default0]:Skipping sample id=2738879. Maximum sequence length: 2049, sample length: 4789 [default0]:Skipping sample id=2752794. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2727775. Maximum sequence length: 2049, sample length: 4122 [default0]:Skipping sample id=2745096. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2486517. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2755004. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2725867. Maximum sequence length: 2049, sample length: 3920 [default0]:Skipping sample id=2740979. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2734217. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2717458. Maximum sequence length: 2049, sample length: 2574 [default0]:Skipping sample id=2737384. Maximum sequence length: 2049, sample length: 3338 [default0]:Skipping sample id=2728298. Maximum sequence length: 2049, sample length: 5811 [default0]:Skipping sample id=2721388. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2744935. Maximum sequence length: 2049, sample length: 2704 [default0]:Skipping sample id=2736954. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2729297. Maximum sequence length: 2049, sample length: 4081 [default0]:Skipping sample id=2746308. Maximum sequence length: 2049, sample length: 3765 [default0]:Skipping sample id=2748921. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2732139. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2484587. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2746283. Maximum sequence length: 2049, sample length: 5047 [default0]:Skipping sample id=2719120. Maximum sequence length: 2049, sample length: 3432 [default0]:Skipping sample id=2745203. Maximum sequence length: 2049, sample length: 6240 [default0]:Skipping sample id=2720078. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2712047. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2716499. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2752038. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2757041. Maximum sequence length: 2049, sample length: 4758 [default0]:Skipping sample id=2740538. Maximum sequence length: 2049, sample length: 6445 [default0]:Skipping sample id=2746206. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2747004. Maximum sequence length: 2049, sample length: 4789 [default0]:Skipping sample id=2749914. Maximum sequence length: 2049, sample length: 4256 [default0]:Skipping sample id=2724132. Maximum sequence length: 2049, sample length: 4652 [default0]:Skipping sample id=2753214. Maximum sequence length: 2049, sample length: 3908 [default0]:Skipping sample id=2741246. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2718791. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2712476. Maximum sequence length: 2049, sample length: 5086 [default0]:Skipping sample id=2736143. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2724113. Maximum sequence length: 2049, sample length: 2969 [default0]:Skipping sample id=2734220. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2756452. Maximum sequence length: 2049, sample length: 4503 [default0]:Skipping sample id=2712369. Maximum sequence length: 2049, sample length: 3211 [default0]:Skipping sample id=2722300. Maximum sequence length: 2049, sample length: 3853 [default0]:Skipping sample id=2745127. Maximum sequence length: 2049, sample length: 2654 [default0]:Skipping sample id=2713993. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2756641. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2728066. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2740634. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2481657. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2749925. Maximum sequence length: 2049, sample length: 3461 [default0]:Skipping sample id=2753996. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2725884. Maximum sequence length: 2049, sample length: 3626 [default0]:Skipping sample id=2749496. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2712129. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2728885. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2467703. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2756848. Maximum sequence length: 2049, sample length: 2951 [default0]:Skipping sample id=2479110. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2747331. Maximum sequence length: 2049, sample length: 3101 [default0]:Skipping sample id=2497538. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2491900. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2487368. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2716431. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2756274. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2722146. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2741230. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2726362. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2717667. Maximum sequence length: 2049, sample length: 2869 [default0]:Skipping sample id=2756727. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2753211. Maximum sequence length: 2049, sample length: 2545 [default0]:Skipping sample id=2745814. Maximum sequence length: 2049, sample length: 3699 [default0]:Skipping sample id=2732885. Maximum sequence length: 2049, sample length: 3200 [default0]:Skipping sample id=2753084. Maximum sequence length: 2049, sample length: 2558 [default0]:Skipping sample id=2481039. Maximum sequence length: 2049, sample length: 3112 [default0]:Skipping sample id=2731739. Maximum sequence length: 2049, sample length: 2746 [default0]:Skipping sample id=2713103. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2755227. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2742130. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2727863. Maximum sequence length: 2049, sample length: 5076 [default0]:Skipping sample id=2745989. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2741678. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2724036. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2712220. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2492610. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2496416. Maximum sequence length: 2049, sample length: 3545 [default0]:Skipping sample id=2735002. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2470979. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2737695. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2743349. Maximum sequence length: 2049, sample length: 2676 [default0]:Skipping sample id=2728822. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2717476. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2714610. Maximum sequence length: 2049, sample length: 3736 [default0]:Skipping sample id=2713850. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2479550. Maximum sequence length: 2049, sample length: 3543 [default0]:Skipping sample id=2727718. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2750184. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2745862. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2495671. Maximum sequence length: 2049, sample length: 3235 [default0]:Skipping sample id=2742667. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2713440. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2733394. Maximum sequence length: 2049, sample length: 3093 [default0]:Skipping sample id=2716157. Maximum sequence length: 2049, sample length: 3023 [default0]:Skipping sample id=2711326. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2718227. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2745427. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2711682. Maximum sequence length: 2049, sample length: 3734 [default0]:Skipping sample id=2734915. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2467942. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2492588. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2468078. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2746126. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2740306. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2492946. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2737183. Maximum sequence length: 2049, sample length: 3420 [default0]:Skipping sample id=2735560. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2478950. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2737202. Maximum sequence length: 2049, sample length: 4013 [default0]:Skipping sample id=2750809. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2489748. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2726614. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2745990. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2744263. Maximum sequence length: 2049, sample length: 3065 [default0]:Skipping sample id=2741992. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2477589. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2712398. Maximum sequence length: 2049, sample length: 3263 [default0]:Skipping sample id=2723693. Maximum sequence length: 2049, sample length: 3211 [default0]:Skipping sample id=2748840. Maximum sequence length: 2049, sample length: 3150 [default0]:Skipping sample id=2756810. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2488156. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2734147. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2751249. Maximum sequence length: 2049, sample length: 2616 [default0]:Skipping sample id=2756549. Maximum sequence length: 2049, sample length: 5628 [default0]:Skipping sample id=2719314. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2724724. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2750027. Maximum sequence length: 2049, sample length: 2771 [default0]:Skipping sample id=2752744. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2751560. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2728604. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2753557. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2495237. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2714063. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2736209. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2730498. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2732181. Maximum sequence length: 2049, sample length: 2944 [default0]:Skipping sample id=2746291. Maximum sequence length: 2049, sample length: 5623 [default0]:Skipping sample id=2736195. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2728153. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2748281. Maximum sequence length: 2049, sample length: 4421 [default0]:Skipping sample id=2493445. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2756862. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2726594. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2713592. Maximum sequence length: 2049, sample length: 3258 [default0]:Skipping sample id=2738841. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2712574. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2731806. Maximum sequence length: 2049, sample length: 2862 [default0]:Skipping sample id=2743540. Maximum sequence length: 2049, sample length: 2575 [default0]:Skipping sample id=2740492. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2724900. Maximum sequence length: 2049, sample length: 3863 [default0]:Skipping sample id=2499237. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2479682. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2493361. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2717313. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2731635. Maximum sequence length: 2049, sample length: 2500 [default0]:Skipping sample id=2722738. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2743639. Maximum sequence length: 2049, sample length: 2995 [default0]:Skipping sample id=2744448. Maximum sequence length: 2049, sample length: 6630 [default0]:Skipping sample id=2753843. Maximum sequence length: 2049, sample length: 2850 [default0]:Skipping sample id=2728959. Maximum sequence length: 2049, sample length: 3397 [default0]:Skipping sample id=2716396. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2482374. Maximum sequence length: 2049, sample length: 2778 [default0]:Skipping sample id=2735855. Maximum sequence length: 2049, sample length: 3037 [default0]:Skipping sample id=2730018. Maximum sequence length: 2049, sample length: 4074 [default0]:Skipping sample id=2732294. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2725105. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2728619. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2752607. Maximum sequence length: 2049, sample length: 2888 [default0]:Skipping sample id=2715114. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2483105. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2713526. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2737553. Maximum sequence length: 2049, sample length: 4365 [default0]:Skipping sample id=2737144. Maximum sequence length: 2049, sample length: 3197 [default0]:Skipping sample id=2483193. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2726295. Maximum sequence length: 2049, sample length: 3309 [default0]:Skipping sample id=2723307. Maximum sequence length: 2049, sample length: 3949 [default0]:Skipping sample id=2735453. Maximum sequence length: 2049, sample length: 3714 [default0]:Skipping sample id=2727716. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2730293. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2727758. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2741071. Maximum sequence length: 2049, sample length: 3991 [default0]:Skipping sample id=2721964. Maximum sequence length: 2049, sample length: 3372 [default0]:Skipping sample id=2737474. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2750609. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2719389. Maximum sequence length: 2049, sample length: 3494 [default0]:Skipping sample id=2730026. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2714493. Maximum sequence length: 2049, sample length: 2855 [default0]:Skipping sample id=2751488. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2746774. Maximum sequence length: 2049, sample length: 3842 [default0]:Skipping sample id=2717071. Maximum sequence length: 2049, sample length: 3186 [default0]:Skipping sample id=2725670. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2724766. Maximum sequence length: 2049, sample length: 4433 [default0]:Skipping sample id=2737533. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2720459. Maximum sequence length: 2049, sample length: 2585 [default0]:Skipping sample id=2713312. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2749218. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2720720. Maximum sequence length: 2049, sample length: 4330 [default0]:Skipping sample id=2737615. Maximum sequence length: 2049, sample length: 3917 [default0]:Skipping sample id=2725234. Maximum sequence length: 2049, sample length: 4642 [default0]:Skipping sample id=2747179. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2712009. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2721547. Maximum sequence length: 2049, sample length: 2822 [default0]:Skipping sample id=2711053. Maximum sequence length: 2049, sample length: 3778 [default0]:Skipping sample id=2496051. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2756511. Maximum sequence length: 2049, sample length: 3417 [default0]:Skipping sample id=2741673. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2748641. Maximum sequence length: 2049, sample length: 2998 [default0]:Skipping sample id=2712389. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2728771. Maximum sequence length: 2049, sample length: 4463 [default0]:Skipping sample id=2728914. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2723991. Maximum sequence length: 2049, sample length: 3422 [default0]:Skipping sample id=2737718. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2717440. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2481773. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2731044. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2743721. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2714965. Maximum sequence length: 2049, sample length: 2938 [default0]:Skipping sample id=2718363. Maximum sequence length: 2049, sample length: 4865 [default0]:Skipping sample id=2725828. Maximum sequence length: 2049, sample length: 4031 [default0]:Skipping sample id=2720553. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2721179. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2715982. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2737793. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2728788. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2732239. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2737933. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2734779. Maximum sequence length: 2049, sample length: 6629 [default0]:Skipping sample id=2493981. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2715883. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2737792. Maximum sequence length: 2049, sample length: 3426 [default0]:Skipping sample id=2754641. Maximum sequence length: 2049, sample length: 3271 [default0]:Skipping sample id=2754075. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2726055. Maximum sequence length: 2049, sample length: 3724 [default0]:Skipping sample id=2477132. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2716355. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2751182. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2729115. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2718975. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2715345. Maximum sequence length: 2049, sample length: 2971 [default0]:Skipping sample id=2714798. Maximum sequence length: 2049, sample length: 5235 [default0]:Skipping sample id=2753797. Maximum sequence length: 2049, sample length: 3551 [default0]:Skipping sample id=2752331. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2754613. Maximum sequence length: 2049, sample length: 4138 [default0]:Skipping sample id=2722764. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2751515. Maximum sequence length: 2049, sample length: 2874 [default0]:Skipping sample id=2493610. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2742267. Maximum sequence length: 2049, sample length: 4126 [default0]:Skipping sample id=2736068. Maximum sequence length: 2049, sample length: 2964 [default0]:Skipping sample id=2470943. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2720212. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2745124. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2742647. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2722292. Maximum sequence length: 2049, sample length: 7278 [default0]:Skipping sample id=2741850. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2749098. Maximum sequence length: 2049, sample length: 4510 [default0]:Skipping sample id=2719482. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2734049. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2494661. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2711568. Maximum sequence length: 2049, sample length: 3870 [default0]:Skipping sample id=2713729. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2745719. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2715536. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2723326. Maximum sequence length: 2049, sample length: 3997 [default0]:Skipping sample id=2735904. Maximum sequence length: 2049, sample length: 6421 [default0]:Skipping sample id=2725149. Maximum sequence length: 2049, sample length: 4786 [default0]:Skipping sample id=2713764. Maximum sequence length: 2049, sample length: 5766 [default0]:Skipping sample id=2730683. Maximum sequence length: 2049, sample length: 2966 [default0]:Skipping sample id=2745208. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2742797. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2725284. Maximum sequence length: 2049, sample length: 3595 [default0]:Skipping sample id=2737253. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2725229. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2489408. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2723306. Maximum sequence length: 2049, sample length: 3150 [default0]:Skipping sample id=2746203. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2729069. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2717702. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2755401. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2738395. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2741477. Maximum sequence length: 2049, sample length: 2941 [default0]:Skipping sample id=2749046. Maximum sequence length: 2049, sample length: 2784 [default0]:Skipping sample id=2717808. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2719292. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2497589. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2723439. Maximum sequence length: 2049, sample length: 4423 [default0]:Skipping sample id=2484253. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2712643. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2734416. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2718623. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2738258. Maximum sequence length: 2049, sample length: 4341 [default0]:Skipping sample id=2496373. Maximum sequence length: 2049, sample length: 3546 [default0]:Skipping sample id=2730302. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2736776. Maximum sequence length: 2049, sample length: 4669 [default0]:Skipping sample id=2736079. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2734376. Maximum sequence length: 2049, sample length: 5363 [default0]:Skipping sample id=2724437. Maximum sequence length: 2049, sample length: 3428 [default0]:Skipping sample id=2496950. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2717368. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2722120. Maximum sequence length: 2049, sample length: 3620 [default0]:Skipping sample id=2742024. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2487071. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2738207. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2722454. Maximum sequence length: 2049, sample length: 3078 [default0]:Skipping sample id=2731014. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2736927. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2719521. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2732819. Maximum sequence length: 2049, sample length: 6946 [default0]:Skipping sample id=2723444. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2477641. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2725298. Maximum sequence length: 2049, sample length: 3841 [default0]:Skipping sample id=2745061. Maximum sequence length: 2049, sample length: 2639 [default0]:Skipping sample id=2713855. Maximum sequence length: 2049, sample length: 2962 [default0]:Skipping sample id=2717654. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2745599. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2739816. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2486193. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2728252. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2756076. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2734168. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2742601. Maximum sequence length: 2049, sample length: 3395 [default0]:Skipping sample id=2757081. Maximum sequence length: 2049, sample length: 2852 [default0]:Skipping sample id=2499405. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2723255. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2732124. Maximum sequence length: 2049, sample length: 3448 [default0]:Skipping sample id=2747323. Maximum sequence length: 2049, sample length: 3315 [default0]:Skipping sample id=2711652. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2722100. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2721985. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2732935. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2751769. Maximum sequence length: 2049, sample length: 3061 [default0]:Skipping sample id=2741379. Maximum sequence length: 2049, sample length: 3710 [default0]:Skipping sample id=2487296. Maximum sequence length: 2049, sample length: 2758 [default0]:Skipping sample id=2725788. Maximum sequence length: 2049, sample length: 3324 [default0]:Skipping sample id=2731251. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2493021. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2724984. Maximum sequence length: 2049, sample length: 4300 [default0]:Skipping sample id=2711759. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2741038. Maximum sequence length: 2049, sample length: 2741 [default0]:Skipping sample id=2742231. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2721998. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2742179. Maximum sequence length: 2049, sample length: 5829 [default0]:Skipping sample id=2733889. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2715944. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2735407. Maximum sequence length: 2049, sample length: 4768 [default0]:Skipping sample id=2729009. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2741498. Maximum sequence length: 2049, sample length: 3827 [default0]:Skipping sample id=2714664. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2739022. Maximum sequence length: 2049, sample length: 3909 [default0]:Skipping sample id=2753112. Maximum sequence length: 2049, sample length: 3086 [default0]:Skipping sample id=2730859. Maximum sequence length: 2049, sample length: 4491 [default0]:Skipping sample id=2713657. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2715988. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2752241. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2728688. Maximum sequence length: 2049, sample length: 2924 [default0]:Skipping sample id=2747501. Maximum sequence length: 2049, sample length: 3253 [default0]:Skipping sample id=2736620. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2748741. Maximum sequence length: 2049, sample length: 3344 [default0]:Skipping sample id=2747642. Maximum sequence length: 2049, sample length: 3347 [default0]:Skipping sample id=2725569. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2488161. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2727822. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2711250. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2748214. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2742352. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2728656. Maximum sequence length: 2049, sample length: 3471 [default0]:Skipping sample id=2714060. Maximum sequence length: 2049, sample length: 3192 [default0]:Skipping sample id=2745818. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2469828. Maximum sequence length: 2049, sample length: 4276 [default0]:Skipping sample id=2741015. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2732480. Maximum sequence length: 2049, sample length: 4149 [default0]:Skipping sample id=2736051. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2739978. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2478140. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2717301. Maximum sequence length: 2049, sample length: 3149 [default0]:Skipping sample id=2732913. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2712761. Maximum sequence length: 2049, sample length: 3697 [default0]:Skipping sample id=2490694. Maximum sequence length: 2049, sample length: 4317 [default0]:Skipping sample id=2732761. Maximum sequence length: 2049, sample length: 3466 [default0]:Skipping sample id=2493273. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2498315. Maximum sequence length: 2049, sample length: 2923 [default0]:Skipping sample id=2721123. Maximum sequence length: 2049, sample length: 3577 [default0]:Skipping sample id=2488517. Maximum sequence length: 2049, sample length: 3066 [default0]:Skipping sample id=2757001. Maximum sequence length: 2049, sample length: 2440 [default0]:Skipping sample id=2712684. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2716655. Maximum sequence length: 2049, sample length: 4683 [default0]:Skipping sample id=2486735. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2756951. Maximum sequence length: 2049, sample length: 2869 [default0]:Skipping sample id=2752910. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2742034. Maximum sequence length: 2049, sample length: 2788 [default0]:Skipping sample id=2466646. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2716292. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2735717. Maximum sequence length: 2049, sample length: 4174 [default0]:Skipping sample id=2730939. Maximum sequence length: 2049, sample length: 2964 [default0]:Skipping sample id=2477735. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2713692. Maximum sequence length: 2049, sample length: 5125 [default0]:Skipping sample id=2722203. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2740075. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2712101. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2732911. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2729289. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2485618. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2714762. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2741764. Maximum sequence length: 2049, sample length: 4230 [default0]:Skipping sample id=2733565. Maximum sequence length: 2049, sample length: 3932 [default0]:Skipping sample id=2481001. Maximum sequence length: 2049, sample length: 3106 [default0]:Skipping sample id=2495508. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2714687. Maximum sequence length: 2049, sample length: 3250 [default0]:Skipping sample id=2497560. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2726891. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2749651. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2491938. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2741543. Maximum sequence length: 2049, sample length: 4415 [default0]:Skipping sample id=2731311. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2751895. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2733327. Maximum sequence length: 2049, sample length: 2865 [default0]:Skipping sample id=2716586. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2741197. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2756699. Maximum sequence length: 2049, sample length: 4121 [default0]:Skipping sample id=2736332. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2755571. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2738070. Maximum sequence length: 2049, sample length: 3019 [default0]:Skipping sample id=2714874. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2720786. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2734425. Maximum sequence length: 2049, sample length: 3602 [default0]:Skipping sample id=2740610. Maximum sequence length: 2049, sample length: 2838 [default0]:Skipping sample id=2731243. Maximum sequence length: 2049, sample length: 3424 [default0]:Skipping sample id=2729711. Maximum sequence length: 2049, sample length: 4176 [default0]:Skipping sample id=2744027. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2732415. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2725303. Maximum sequence length: 2049, sample length: 4596 [default0]:Skipping sample id=2755002. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2730439. Maximum sequence length: 2049, sample length: 3588 [default0]:Skipping sample id=2754882. Maximum sequence length: 2049, sample length: 2963 [default0]:Skipping sample id=2755131. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2749827. Maximum sequence length: 2049, sample length: 3295 [default0]:Skipping sample id=2720756. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2731345. Maximum sequence length: 2049, sample length: 3073 [default0]:Skipping sample id=2749486. Maximum sequence length: 2049, sample length: 3011 [default0]:Skipping sample id=2713081. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2712284. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2724450. Maximum sequence length: 2049, sample length: 2569 [default0]:Skipping sample id=2725927. Maximum sequence length: 2049, sample length: 3276 [default0]:Skipping sample id=2725299. Maximum sequence length: 2049, sample length: 5238 [default0]:Skipping sample id=2749156. Maximum sequence length: 2049, sample length: 2855 [default0]:Skipping sample id=2746058. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2755051. Maximum sequence length: 2049, sample length: 3760 [default0]:Skipping sample id=2737101. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2714750. Maximum sequence length: 2049, sample length: 3741 [default0]:Skipping sample id=2493157. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2748322. Maximum sequence length: 2049, sample length: 3566 [default0]:Skipping sample id=2713793. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2484897. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2727501. Maximum sequence length: 2049, sample length: 5456 [default0]:Skipping sample id=2735319. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2726251. Maximum sequence length: 2049, sample length: 4295 [default0]:Skipping sample id=2716881. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2730789. Maximum sequence length: 2049, sample length: 3809 [default0]:Skipping sample id=2725465. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2751770. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2479750. Maximum sequence length: 2049, sample length: 2854 [default0]:Skipping sample id=2485984. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2499232. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2712501. Maximum sequence length: 2049, sample length: 2904 [default0]:Skipping sample id=2739745. Maximum sequence length: 2049, sample length: 3851 [default0]:Skipping sample id=2739987. Maximum sequence length: 2049, sample length: 3400 [default0]:Skipping sample id=2720209. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2735024. Maximum sequence length: 2049, sample length: 4416 [default0]:Skipping sample id=2479815. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2728337. Maximum sequence length: 2049, sample length: 5429 [default0]:Skipping sample id=2729805. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2736661. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2484122. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2467582. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2469201. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2712613. Maximum sequence length: 2049, sample length: 4563 [default0]:Skipping sample id=2751882. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2487136. Maximum sequence length: 2049, sample length: 2853 [default0]:Skipping sample id=2712378. Maximum sequence length: 2049, sample length: 4485 [default0]:Skipping sample id=2755125. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2736863. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2735217. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2751682. Maximum sequence length: 2049, sample length: 3118 [default0]:Skipping sample id=2747066. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2494826. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2481194. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2726642. Maximum sequence length: 2049, sample length: 5076 [default0]:Skipping sample id=2750790. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2748786. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2480423. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2753711. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2737841. Maximum sequence length: 2049, sample length: 2974 [default0]:Skipping sample id=2715769. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2722382. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2718750. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2740107. Maximum sequence length: 2049, sample length: 5050 [default0]:Skipping sample id=2733751. Maximum sequence length: 2049, sample length: 4431 [default0]:Skipping sample id=2728815. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2477265. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2727058. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2741295. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2718595. Maximum sequence length: 2049, sample length: 3642 [default0]:Skipping sample id=2741649. Maximum sequence length: 2049, sample length: 4070 [default0]:Skipping sample id=2745359. Maximum sequence length: 2049, sample length: 2967 [default0]:Skipping sample id=2465908. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2749266. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2750217. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2750190. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2746325. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2746961. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2737125. Maximum sequence length: 2049, sample length: 4076 [default0]:Skipping sample id=2725615. Maximum sequence length: 2049, sample length: 3447 [default0]:Skipping sample id=2724608. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2718978. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2736566. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2750583. Maximum sequence length: 2049, sample length: 3808 [default0]:Skipping sample id=2735420. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2731372. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2738978. Maximum sequence length: 2049, sample length: 3833 [default0]:Skipping sample id=2719200. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2731677. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2715117. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2736561. Maximum sequence length: 2049, sample length: 4817 [default0]:Skipping sample id=2737297. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2722770. Maximum sequence length: 2049, sample length: 4393 [default0]:Skipping sample id=2741050. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2482754. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2487683. Maximum sequence length: 2049, sample length: 2470 [default0]:Skipping sample id=2742534. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2711489. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2725207. Maximum sequence length: 2049, sample length: 3128 [default0]:Skipping sample id=2713163. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2719974. Maximum sequence length: 2049, sample length: 3822 [default0]:Skipping sample id=2711469. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2721883. Maximum sequence length: 2049, sample length: 6067 [default0]:Skipping sample id=2727385. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2732979. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2488459. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2720439. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2739557. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2735314. Maximum sequence length: 2049, sample length: 5225 [default0]:Skipping sample id=2713994. Maximum sequence length: 2049, sample length: 8506 [default0]:Skipping sample id=2741634. Maximum sequence length: 2049, sample length: 2768 [default0]:Skipping sample id=2725548. Maximum sequence length: 2049, sample length: 2756 [default0]:Skipping sample id=2735819. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2725221. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2723402. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2716325. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2713785. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2731731. Maximum sequence length: 2049, sample length: 7319 [default0]:Skipping sample id=2747788. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2731472. Maximum sequence length: 2049, sample length: 3191 [default0]:Skipping sample id=2742507. Maximum sequence length: 2049, sample length: 4308 [default0]:Skipping sample id=2755500. Maximum sequence length: 2049, sample length: 3570 [default0]:Skipping sample id=2712157. Maximum sequence length: 2049, sample length: 3875 [default0]:Skipping sample id=2726328. Maximum sequence length: 2049, sample length: 3273 [default0]:Skipping sample id=2478126. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2725690. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2743955. Maximum sequence length: 2049, sample length: 4251 [default0]:Skipping sample id=2740141. Maximum sequence length: 2049, sample length: 3267 [default0]:Skipping sample id=2728299. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2734125. Maximum sequence length: 2049, sample length: 3345 [default0]:Skipping sample id=2756245. Maximum sequence length: 2049, sample length: 3577 [default0]:Skipping sample id=2734898. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2732665. Maximum sequence length: 2049, sample length: 5281 [default0]:Skipping sample id=2756913. Maximum sequence length: 2049, sample length: 2369 [default0]:Skipping sample id=2736667. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2726245. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2724389. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2745992. Maximum sequence length: 2049, sample length: 2489 [default0]:Skipping sample id=2711399. Maximum sequence length: 2049, sample length: 3286 [default0]:Skipping sample id=2727068. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2741865. Maximum sequence length: 2049, sample length: 3162 [default0]:Skipping sample id=2726696. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2719550. Maximum sequence length: 2049, sample length: 3589 [default0]:Skipping sample id=2737576. Maximum sequence length: 2049, sample length: 3644 [default0]:Skipping sample id=2470379. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2729558. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2729863. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2740705. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2734559. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2729779. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2748190. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2735995. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2753437. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2484364. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2483936. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2494638. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2729259. Maximum sequence length: 2049, sample length: 3615 [default0]:Skipping sample id=2738928. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2726097. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2732375. Maximum sequence length: 2049, sample length: 2987 [default0]:Skipping sample id=2483532. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2755050. Maximum sequence length: 2049, sample length: 3068 [default0]:Skipping sample id=2745225. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2750642. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2486074. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2713171. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2731078. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2754914. Maximum sequence length: 2049, sample length: 4564 [default0]:Skipping sample id=2751649. Maximum sequence length: 2049, sample length: 2981 [default0]:Skipping sample id=2486859. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2712918. Maximum sequence length: 2049, sample length: 4934 [default0]:Skipping sample id=2757107. Maximum sequence length: 2049, sample length: 3068 [default0]:Skipping sample id=2478162. Maximum sequence length: 2049, sample length: 3631 [default0]:Skipping sample id=2479832. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2755452. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2738622. Maximum sequence length: 2049, sample length: 3168 [default0]:Skipping sample id=2731918. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2748664. Maximum sequence length: 2049, sample length: 3576 [default0]:Skipping sample id=2722182. Maximum sequence length: 2049, sample length: 3282 [default0]:Skipping sample id=2728231. Maximum sequence length: 2049, sample length: 2998 [default0]:Skipping sample id=2490438. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2724169. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2727534. Maximum sequence length: 2049, sample length: 3599 [default0]:Skipping sample id=2755642. Maximum sequence length: 2049, sample length: 3542 [default0]:Skipping sample id=2713333. Maximum sequence length: 2049, sample length: 3981 [default0]:Skipping sample id=2711899. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2752994. Maximum sequence length: 2049, sample length: 3161 [default0]:Skipping sample id=2747230. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2718041. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2746552. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2726206. Maximum sequence length: 2049, sample length: 2576 [default0]:Skipping sample id=2494211. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2726803. Maximum sequence length: 2049, sample length: 3775 [default0]:Skipping sample id=2735925. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2730713. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2739724. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2725778. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2721253. Maximum sequence length: 2049, sample length: 3281 [default0]:Skipping sample id=2724534. Maximum sequence length: 2049, sample length: 3275 [default0]:Skipping sample id=2733605. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2749862. Maximum sequence length: 2049, sample length: 3053 [default0]:Skipping sample id=2729326. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2752762. Maximum sequence length: 2049, sample length: 5085 [default0]:Skipping sample id=2750164. Maximum sequence length: 2049, sample length: 2980 [default0]:Skipping sample id=2720272. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2718908. Maximum sequence length: 2049, sample length: 5121 [default0]:Skipping sample id=2496514. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2757027. Maximum sequence length: 2049, sample length: 3112 [default0]:Skipping sample id=2722553. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2749469. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2739303. Maximum sequence length: 2049, sample length: 3383 [default0]:Skipping sample id=2467304. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2731951. Maximum sequence length: 2049, sample length: 3814 [default0]:Skipping sample id=2734654. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2722596. Maximum sequence length: 2049, sample length: 5754 [default0]:Skipping sample id=2732521. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2747162. Maximum sequence length: 2049, sample length: 4364 [default0]:Skipping sample id=2488030. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2496642. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2737859. Maximum sequence length: 2049, sample length: 3071 [default0]:Skipping sample id=2748724. Maximum sequence length: 2049, sample length: 3901 [default0]:Skipping sample id=2726471. Maximum sequence length: 2049, sample length: 4059 [default0]:Skipping sample id=2730399. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2750187. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2714915. Maximum sequence length: 2049, sample length: 2708 [default0]:Skipping sample id=2496690. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2746519. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2738890. Maximum sequence length: 2049, sample length: 3468 [default0]:Skipping sample id=2746166. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2744428. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2716901. Maximum sequence length: 2049, sample length: 3844 [default0]:Skipping sample id=2737910. Maximum sequence length: 2049, sample length: 3361 [default0]:Skipping sample id=2753146. Maximum sequence length: 2049, sample length: 5321 [default0]:Skipping sample id=2753991. Maximum sequence length: 2049, sample length: 3600 [default0]:Skipping sample id=2743929. Maximum sequence length: 2049, sample length: 6967 [default0]:Skipping sample id=2724122. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2722985. Maximum sequence length: 2049, sample length: 2868 [default0]:Skipping sample id=2738135. Maximum sequence length: 2049, sample length: 3350 [default0]:Skipping sample id=2745254. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2718943. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2728827. Maximum sequence length: 2049, sample length: 2766 [default0]:Skipping sample id=2741034. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2719303. Maximum sequence length: 2049, sample length: 3515 [default0]:Skipping sample id=2747017. Maximum sequence length: 2049, sample length: 3302 [default0]:Skipping sample id=2722807. Maximum sequence length: 2049, sample length: 4771 [default0]:Skipping sample id=2711471. Maximum sequence length: 2049, sample length: 2709 [default0]:Skipping sample id=2713535. Maximum sequence length: 2049, sample length: 4168 [default0]:Skipping sample id=2722473. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2713929. Maximum sequence length: 2049, sample length: 3175 [default0]:Skipping sample id=2721747. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2727151. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2739837. Maximum sequence length: 2049, sample length: 2369 [default0]:Skipping sample id=2491667. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2465788. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2488679. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2751409. Maximum sequence length: 2049, sample length: 2984 [default0]:Skipping sample id=2734249. Maximum sequence length: 2049, sample length: 3405 [default0]:Skipping sample id=2713196. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2469870. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2715690. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2712904. Maximum sequence length: 2049, sample length: 4133 [default0]:Skipping sample id=2746750. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2481543. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2745678. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2741757. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2740470. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2720506. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2483001. Maximum sequence length: 2049, sample length: 2120 [default0]:Skipping sample id=2745800. Maximum sequence length: 2049, sample length: 3430 [default0]:Skipping sample id=2727636. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2732974. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2754546. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2470818. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2478820. Maximum sequence length: 2049, sample length: 3512 [default0]:Skipping sample id=2732728. Maximum sequence length: 2049, sample length: 2868 [default0]:Skipping sample id=2483771. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2738050. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2746561. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2467948. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2713688. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2745242. Maximum sequence length: 2049, sample length: 3563 [default0]:Skipping sample id=2755799. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2750397. Maximum sequence length: 2049, sample length: 4046 [default0]:Skipping sample id=2734639. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2716968. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2754949. Maximum sequence length: 2049, sample length: 3898 [default0]:Skipping sample id=2733783. Maximum sequence length: 2049, sample length: 4950 [default0]:Skipping sample id=2713360. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2757119. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2490494. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2470671. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2726856. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2754970. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2754509. Maximum sequence length: 2049, sample length: 3811 [default0]:Skipping sample id=2477594. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2489039. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2732900. Maximum sequence length: 2049, sample length: 3755 [default0]:Skipping sample id=2741574. Maximum sequence length: 2049, sample length: 4864 [default0]:Skipping sample id=2744724. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2742903. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2712919. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2736956. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2714511. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2718898. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2742073. Maximum sequence length: 2049, sample length: 2879 [default0]:Skipping sample id=2730257. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2494455. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2739029. Maximum sequence length: 2049, sample length: 4401 [default0]:Skipping sample id=2723118. Maximum sequence length: 2049, sample length: 2624 [default0]:Skipping sample id=2739180. Maximum sequence length: 2049, sample length: 4114 [default0]:Skipping sample id=2713318. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2483836. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2499388. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2715560. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2728409. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2741362. Maximum sequence length: 2049, sample length: 3472 [default0]:Skipping sample id=2739217. Maximum sequence length: 2049, sample length: 3313 [default0]:Skipping sample id=2720274. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2726792. Maximum sequence length: 2049, sample length: 6063 [default0]:Skipping sample id=2743898. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2735581. Maximum sequence length: 2049, sample length: 3959 [default0]:Skipping sample id=2738324. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2751454. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2736905. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2717415. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2734317. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2735568. Maximum sequence length: 2049, sample length: 3587 [default0]:Skipping sample id=2727216. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2712831. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2723593. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2748372. Maximum sequence length: 2049, sample length: 4627 [default0]:Skipping sample id=2728945. Maximum sequence length: 2049, sample length: 2944 [default0]:Skipping sample id=2738871. Maximum sequence length: 2049, sample length: 2434 [default0]:Skipping sample id=2713428. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2729881. Maximum sequence length: 2049, sample length: 4143 [default0]:Skipping sample id=2746077. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2479553. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2756860. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2750590. Maximum sequence length: 2049, sample length: 3123 [default0]:Skipping sample id=2494604. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2722600. Maximum sequence length: 2049, sample length: 2796 [default0]:Skipping sample id=2491282. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2753024. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2711073. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2736808. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2711516. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2482534. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2737024. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2737251. Maximum sequence length: 2049, sample length: 3247 [default0]:Skipping sample id=2733619. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2755956. Maximum sequence length: 2049, sample length: 4536 [default0]:Skipping sample id=2741723. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2712236. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2728906. Maximum sequence length: 2049, sample length: 3986 [default0]:Skipping sample id=2481749. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2729848. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2724545. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2735373. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2480517. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2751072. Maximum sequence length: 2049, sample length: 3501 [default0]:Skipping sample id=2743702. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2732218. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2746901. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2712124. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2496862. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2749268. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2735780. Maximum sequence length: 2049, sample length: 3134 [default0]:Skipping sample id=2740766. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2715926. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2741738. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2749140. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2721089. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2757046. Maximum sequence length: 2049, sample length: 3881 [default0]:Skipping sample id=2753384. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2722157. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2754305. Maximum sequence length: 2049, sample length: 3910 [default0]:Skipping sample id=2496806. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2720665. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2742798. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2741391. Maximum sequence length: 2049, sample length: 2972 [default0]:Skipping sample id=2743480. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2717298. Maximum sequence length: 2049, sample length: 3468 [default0]:Skipping sample id=2731132. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2741222. Maximum sequence length: 2049, sample length: 2455 [default0]:Skipping sample id=2745214. Maximum sequence length: 2049, sample length: 6666 [default0]:Skipping sample id=2466976. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2746989. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2738445. Maximum sequence length: 2049, sample length: 3085 [default0]:Skipping sample id=2710992. Maximum sequence length: 2049, sample length: 2841 [default0]:Skipping sample id=2716290. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2741509. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2466938. Maximum sequence length: 2049, sample length: 3391 [default0]:Skipping sample id=2721731. Maximum sequence length: 2049, sample length: 2440 [default0]:Skipping sample id=2749906. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2724784. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2739595. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2735273. Maximum sequence length: 2049, sample length: 3819 [default0]:Skipping sample id=2721514. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2738786. Maximum sequence length: 2049, sample length: 2852 [default0]:Skipping sample id=2484872. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2724252. Maximum sequence length: 2049, sample length: 8513 [default0]:Skipping sample id=2726386. Maximum sequence length: 2049, sample length: 4361 [default0]:Skipping sample id=2734477. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2496060. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2726136. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2750285. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2752761. Maximum sequence length: 2049, sample length: 3431 [default0]:Skipping sample id=2487531. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2730909. Maximum sequence length: 2049, sample length: 4135 [default0]:Skipping sample id=2487908. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2497507. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2742161. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2730984. Maximum sequence length: 2049, sample length: 5247 [default0]:Skipping sample id=2746482. Maximum sequence length: 2049, sample length: 2977 [default0]:Skipping sample id=2730395. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2712595. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2497415. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2497495. Maximum sequence length: 2049, sample length: 2717 [default0]:Skipping sample id=2723245. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2749568. Maximum sequence length: 2049, sample length: 3553 [default0]:Skipping sample id=2718656. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2753027. Maximum sequence length: 2049, sample length: 3611 [default0]:Skipping sample id=2735386. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2742412. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2722272. Maximum sequence length: 2049, sample length: 3454 [default0]:Skipping sample id=2712151. Maximum sequence length: 2049, sample length: 5070 [default0]:Skipping sample id=2730966. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2470823. Maximum sequence length: 2049, sample length: 2717 [default0]:Skipping sample id=2499152. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2755830. Maximum sequence length: 2049, sample length: 5063 [default0]:Skipping sample id=2718210. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2479657. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2713076. Maximum sequence length: 2049, sample length: 4868 [default0]:Skipping sample id=2720980. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2711832. Maximum sequence length: 2049, sample length: 5644 [default0]:Skipping sample id=2752893. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2744269. Maximum sequence length: 2049, sample length: 3540 [default0]:Skipping sample id=2493856. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2716077. Maximum sequence length: 2049, sample length: 2748 [default0]:Skipping sample id=2728699. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2729275. Maximum sequence length: 2049, sample length: 2567 [default0]:Skipping sample id=2748498. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2747308. Maximum sequence length: 2049, sample length: 5837 [default0]:Skipping sample id=2738882. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2713514. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2755782. Maximum sequence length: 2049, sample length: 3645 [default0]:Skipping sample id=2726637. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2750595. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2750777. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2739209. Maximum sequence length: 2049, sample length: 3970 [default0]:Skipping sample id=2715271. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2723848. Maximum sequence length: 2049, sample length: 3987 [default0]:Skipping sample id=2727704. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2718551. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2753127. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2742097. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2727792. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2470222. Maximum sequence length: 2049, sample length: 2781 [default0]:Skipping sample id=2727954. Maximum sequence length: 2049, sample length: 3459 [default0]:Skipping sample id=2734941. Maximum sequence length: 2049, sample length: 2966 [default0]:Skipping sample id=2497531. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2738181. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2732396. Maximum sequence length: 2049, sample length: 3688 [default0]:Skipping sample id=2724143. Maximum sequence length: 2049, sample length: 4159 [default0]:Skipping sample id=2718006. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2741873. Maximum sequence length: 2049, sample length: 2898 [default0]:Skipping sample id=2716585. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2747163. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2738862. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2719321. Maximum sequence length: 2049, sample length: 2562 [default0]:Skipping sample id=2749503. Maximum sequence length: 2049, sample length: 4554 [default0]:Skipping sample id=2726743. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2752123. Maximum sequence length: 2049, sample length: 3168 [default0]:Skipping sample id=2711006. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2727725. Maximum sequence length: 2049, sample length: 5737 [default0]:Skipping sample id=2751650. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2711323. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2746090. Maximum sequence length: 2049, sample length: 4003 [default0]:Skipping sample id=2747022. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2730067. Maximum sequence length: 2049, sample length: 3995 [default0]:Skipping sample id=2750971. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2721073. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2470414. Maximum sequence length: 2049, sample length: 3034 [default0]:Skipping sample id=2738209. Maximum sequence length: 2049, sample length: 3009 [default0]:Skipping sample id=2723484. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2727348. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2739765. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2750272. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2749031. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2713625. Maximum sequence length: 2049, sample length: 3905 [default0]:Skipping sample id=2753277. Maximum sequence length: 2049, sample length: 3391 [default0]:Skipping sample id=2713693. Maximum sequence length: 2049, sample length: 4981 [default0]:Skipping sample id=2743368. Maximum sequence length: 2049, sample length: 2781 [default0]:Skipping sample id=2729120. Maximum sequence length: 2049, sample length: 3405 [default0]:Skipping sample id=2722220. Maximum sequence length: 2049, sample length: 3425 [default0]:Skipping sample id=2715972. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2744397. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2734465. Maximum sequence length: 2049, sample length: 6007 [default0]:Skipping sample id=2752905. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2757011. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2722338. Maximum sequence length: 2049, sample length: 3245 [default0]:Skipping sample id=2720983. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2719911. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2481493. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2728109. Maximum sequence length: 2049, sample length: 3313 [default0]:Skipping sample id=2753204. Maximum sequence length: 2049, sample length: 2963 [default0]:Skipping sample id=2467785. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2745423. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2742579. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2714418. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2723049. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2722921. Maximum sequence length: 2049, sample length: 3837 [default0]:Skipping sample id=2754245. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2747619. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2478013. Maximum sequence length: 2049, sample length: 3261 [default0]:Skipping sample id=2740929. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2722853. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2730252. Maximum sequence length: 2049, sample length: 3870 [default0]:Skipping sample id=2722688. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2484134. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2724629. Maximum sequence length: 2049, sample length: 2947 [default0]:Skipping sample id=2715357. Maximum sequence length: 2049, sample length: 3440 [default0]:Skipping sample id=2747841. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2726177. Maximum sequence length: 2049, sample length: 2957 [default0]:Skipping sample id=2739688. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2754724. Maximum sequence length: 2049, sample length: 4926 [default0]:Skipping sample id=2496818. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2487370. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2729821. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2749318. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2722816. Maximum sequence length: 2049, sample length: 4341 [default0]:Skipping sample id=2723344. Maximum sequence length: 2049, sample length: 3688 [default0]:Skipping sample id=2711060. Maximum sequence length: 2049, sample length: 4823 [default0]:Skipping sample id=2723595. Maximum sequence length: 2049, sample length: 5040 [default0]:Skipping sample id=2753826. Maximum sequence length: 2049, sample length: 4281 [default0]:Skipping sample id=2728291. Maximum sequence length: 2049, sample length: 4123 [default0]:Skipping sample id=2722542. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2711565. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2742614. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2722977. Maximum sequence length: 2049, sample length: 2755 [default0]:Skipping sample id=2470755. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2491748. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2728933. Maximum sequence length: 2049, sample length: 3027 [default0]:Skipping sample id=2729928. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2734914. Maximum sequence length: 2049, sample length: 3947 [default0]:Skipping sample id=2492307. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2725294. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2488173. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2743103. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2478631. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2725966. Maximum sequence length: 2049, sample length: 3146 [default0]:Skipping sample id=2728093. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2726880. Maximum sequence length: 2049, sample length: 4268 [default0]:Skipping sample id=2498825. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2731935. Maximum sequence length: 2049, sample length: 6674 [default0]:Skipping sample id=2723053. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2756811. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2492481. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2730536. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2714100. Maximum sequence length: 2049, sample length: 3887 [default0]:Skipping sample id=2715595. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2481702. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2726405. Maximum sequence length: 2049, sample length: 5003 [default0]:Skipping sample id=2732546. Maximum sequence length: 2049, sample length: 14247 [default0]:Skipping sample id=2712915. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2740006. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2750396. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2727004. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2481828. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2732073. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2724052. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2730797. Maximum sequence length: 2049, sample length: 2958 [default0]:Skipping sample id=2479275. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2727538. Maximum sequence length: 2049, sample length: 5060 [default0]:Skipping sample id=2483334. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2742854. Maximum sequence length: 2049, sample length: 3081 [default0]:Skipping sample id=2726125. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2739447. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2720248. Maximum sequence length: 2049, sample length: 5610 [default0]:Skipping sample id=2731580. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2754943. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2756474. Maximum sequence length: 2049, sample length: 3426 [default0]:Skipping sample id=2735393. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2741353. Maximum sequence length: 2049, sample length: 2977 [default0]:Skipping sample id=2752323. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2723750. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2742525. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2736480. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2723079. Maximum sequence length: 2049, sample length: 4021 [default0]:Skipping sample id=2733255. Maximum sequence length: 2049, sample length: 6292 [default0]:Skipping sample id=2729923. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2717603. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2478778. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2723323. Maximum sequence length: 2049, sample length: 2899 [default0]:Skipping sample id=2717964. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2752273. Maximum sequence length: 2049, sample length: 3068 [default0]:Skipping sample id=2737449. Maximum sequence length: 2049, sample length: 4032 [default0]:Skipping sample id=2752470. Maximum sequence length: 2049, sample length: 3355 [default0]:Skipping sample id=2754689. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2754906. Maximum sequence length: 2049, sample length: 4449 [default0]:Skipping sample id=2492441. Maximum sequence length: 2049, sample length: 2679 [default0]:Skipping sample id=2729359. Maximum sequence length: 2049, sample length: 3772 [default0]:Skipping sample id=2747811. Maximum sequence length: 2049, sample length: 4375 [default0]:Skipping sample id=2732533. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2745188. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2755929. Maximum sequence length: 2049, sample length: 3921 [default0]:Skipping sample id=2745362. Maximum sequence length: 2049, sample length: 3101 [default0]:Skipping sample id=2744779. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2494776. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2728378. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2750607. Maximum sequence length: 2049, sample length: 3746 [default0]:Skipping sample id=2746105. Maximum sequence length: 2049, sample length: 3482 [default0]:Skipping sample id=2733438. Maximum sequence length: 2049, sample length: 4173 [default0]:Skipping sample id=2729151. Maximum sequence length: 2049, sample length: 5756 [default0]:Skipping sample id=2722102. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2741817. Maximum sequence length: 2049, sample length: 4124 [default0]:Skipping sample id=2717484. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2482091. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2722708. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2747322. Maximum sequence length: 2049, sample length: 6530 [default0]:Skipping sample id=2728830. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2720298. Maximum sequence length: 2049, sample length: 3622 [default0]:Skipping sample id=2470706. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2482443. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2739969. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2748552. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2492278. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2737262. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2755995. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2717718. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2723784. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2747419. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2716008. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2718248. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2729961. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2742898. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2756886. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2725754. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2719609. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2730730. Maximum sequence length: 2049, sample length: 3345 [default0]:Skipping sample id=2744814. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2466458. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2737022. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2757098. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2733435. Maximum sequence length: 2049, sample length: 3820 [default0]:Skipping sample id=2719558. Maximum sequence length: 2049, sample length: 6465 [default0]:Skipping sample id=2494093. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2714040. Maximum sequence length: 2049, sample length: 3715 [default0]:Skipping sample id=2719646. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2747034. Maximum sequence length: 2049, sample length: 4678 [default0]:Skipping sample id=2739664. Maximum sequence length: 2049, sample length: 3754 [default0]:Skipping sample id=2483295. Maximum sequence length: 2049, sample length: 2850 [default0]:Skipping sample id=2478642. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2745154. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2724378. Maximum sequence length: 2049, sample length: 3342 [default0]:Skipping sample id=2482737. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2486812. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2753379. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2494022. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2723401. Maximum sequence length: 2049, sample length: 2902 [default0]:Skipping sample id=2753387. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2746491. Maximum sequence length: 2049, sample length: 3472 [default0]:Skipping sample id=2723312. Maximum sequence length: 2049, sample length: 3305 [default0]:Skipping sample id=2742503. Maximum sequence length: 2049, sample length: 3425 [default0]:Skipping sample id=2712320. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2720489. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2717157. Maximum sequence length: 2049, sample length: 3171 [default0]:Skipping sample id=2470464. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2736318. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2734140. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2724432. Maximum sequence length: 2049, sample length: 3522 [default0]:Skipping sample id=2465930. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2498486. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2489370. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2734919. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2744684. Maximum sequence length: 2049, sample length: 4000 [default0]:Skipping sample id=2716211. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2498731. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2748123. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2481514. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2487686. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2490739. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2726455. Maximum sequence length: 2049, sample length: 4031 [default0]:Skipping sample id=2731330. Maximum sequence length: 2049, sample length: 6514 [default0]:Skipping sample id=2470250. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2720521. Maximum sequence length: 2049, sample length: 3467 [default0]:Skipping sample id=2712197. Maximum sequence length: 2049, sample length: 4155 [default0]:Skipping sample id=2711377. Maximum sequence length: 2049, sample length: 3491 [default0]:Skipping sample id=2714006. Maximum sequence length: 2049, sample length: 4148 [default0]:Skipping sample id=2732528. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2723019. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2726829. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2721405. Maximum sequence length: 2049, sample length: 3938 [default0]:Skipping sample id=2756796. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2718243. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2736117. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2720773. Maximum sequence length: 2049, sample length: 3315 [default0]:Skipping sample id=2466347. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2740510. Maximum sequence length: 2049, sample length: 3679 [default0]:Skipping sample id=2721774. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2480430. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2487328. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2753420. Maximum sequence length: 2049, sample length: 3944 [default0]:Skipping sample id=2479289. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2743116. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2485551. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2712548. Maximum sequence length: 2049, sample length: 2434 [default0]:Skipping sample id=2718564. Maximum sequence length: 2049, sample length: 3204 [default0]:Skipping sample id=2732989. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2714255. Maximum sequence length: 2049, sample length: 4301 [default0]:Skipping sample id=2717152. Maximum sequence length: 2049, sample length: 3277 [default0]:Skipping sample id=2735079. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2487581. Maximum sequence length: 2049, sample length: 2584 [default0]:Skipping sample id=2732708. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2750434. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2729812. Maximum sequence length: 2049, sample length: 2966 [default0]:Skipping sample id=2715251. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2756347. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2713613. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2471282. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2746975. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2733246. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2754598. Maximum sequence length: 2049, sample length: 2935 [default0]:Skipping sample id=2753244. Maximum sequence length: 2049, sample length: 2810 [default0]:Skipping sample id=2752110. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2740044. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2754447. Maximum sequence length: 2049, sample length: 3136 [default0]:Skipping sample id=2751983. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2724632. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2737962. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2736565. Maximum sequence length: 2049, sample length: 2907 [default0]:Skipping sample id=2721128. Maximum sequence length: 2049, sample length: 3643 [default0]:Skipping sample id=2481997. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2750866. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2487450. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2487982. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2740294. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2748088. Maximum sequence length: 2049, sample length: 3158 [default0]:Skipping sample id=2486294. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2723993. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2752640. Maximum sequence length: 2049, sample length: 3306 [default0]:Skipping sample id=2724027. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2730350. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2752251. Maximum sequence length: 2049, sample length: 2486 [default0]:Skipping sample id=2711442. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2743944. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2718151. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2712108. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2715506. Maximum sequence length: 2049, sample length: 3162 [default0]:Skipping sample id=2723475. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2748305. Maximum sequence length: 2049, sample length: 2871 [default0]:Skipping sample id=2736976. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2719222. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2725337. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2726439. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2711036. Maximum sequence length: 2049, sample length: 2973 [default0]:Skipping sample id=2468841. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2723391. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2481884. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2488695. Maximum sequence length: 2049, sample length: 3510 [default0]:Skipping sample id=2491217. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2751248. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2727536. Maximum sequence length: 2049, sample length: 5281 [default0]:Skipping sample id=2727445. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2717635. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2471091. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2724783. Maximum sequence length: 2049, sample length: 3889 [default0]:Skipping sample id=2748168. Maximum sequence length: 2049, sample length: 3370 [default0]:Skipping sample id=2723174. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2756980. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2711165. Maximum sequence length: 2049, sample length: 5826 [default0]:Skipping sample id=2716400. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2719987. Maximum sequence length: 2049, sample length: 3098 [default0]:Skipping sample id=2724117. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2720448. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2727279. Maximum sequence length: 2049, sample length: 3089 [default0]:Skipping sample id=2738464. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2725987. Maximum sequence length: 2049, sample length: 2639 [default0]:Skipping sample id=2493034. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2483841. Maximum sequence length: 2049, sample length: 2864 [default0]:Skipping sample id=2712758. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2723538. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2716423. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2730363. Maximum sequence length: 2049, sample length: 4625 [default0]:Skipping sample id=2735459. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2721076. Maximum sequence length: 2049, sample length: 4313 [default0]:Skipping sample id=2755897. Maximum sequence length: 2049, sample length: 3686 [default0]:Skipping sample id=2718502. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2745555. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2728287. Maximum sequence length: 2049, sample length: 2456 [default0]:Skipping sample id=2743458. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2714869. Maximum sequence length: 2049, sample length: 3675 [default0]:Skipping sample id=2740671. Maximum sequence length: 2049, sample length: 3429 [default0]:Skipping sample id=2740713. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2742661. Maximum sequence length: 2049, sample length: 2521 [default0]:Skipping sample id=2724777. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2726409. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2741086. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2731642. Maximum sequence length: 2049, sample length: 5927 [default0]:Skipping sample id=2721495. Maximum sequence length: 2049, sample length: 3304 [default0]:Skipping sample id=2740803. Maximum sequence length: 2049, sample length: 5958 [default0]:Skipping sample id=2724325. Maximum sequence length: 2049, sample length: 2755 [default0]:Skipping sample id=2735635. Maximum sequence length: 2049, sample length: 5600 [default0]:Skipping sample id=2737064. Maximum sequence length: 2049, sample length: 6522 [default0]:Skipping sample id=2711706. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2742806. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2754424. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2729409. Maximum sequence length: 2049, sample length: 3596 [default0]:Skipping sample id=2734809. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2753467. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2718044. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2713363. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2733644. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2720002. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2755890. Maximum sequence length: 2049, sample length: 2923 [default0]:Skipping sample id=2734509. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2754645. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2727307. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2734305. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2737591. Maximum sequence length: 2049, sample length: 4688 [default0]:Skipping sample id=2751611. Maximum sequence length: 2049, sample length: 2985 [default0]:Skipping sample id=2738259. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2470719. Maximum sequence length: 2049, sample length: 2896 [default0]:Skipping sample id=2711526. Maximum sequence length: 2049, sample length: 4683 [default0]:Skipping sample id=2754529. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2733022. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2723724. Maximum sequence length: 2049, sample length: 3649 [default0]:Skipping sample id=2738918. Maximum sequence length: 2049, sample length: 3904 [default0]:Skipping sample id=2715851. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2732764. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2718398. Maximum sequence length: 2049, sample length: 3481 [default0]:Skipping sample id=2729309. Maximum sequence length: 2049, sample length: 2507 [default0]:Skipping sample id=2716888. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2736978. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2726899. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2733420. Maximum sequence length: 2049, sample length: 4123 [default0]:Skipping sample id=2737460. Maximum sequence length: 2049, sample length: 2470 [default0]:Skipping sample id=2737653. Maximum sequence length: 2049, sample length: 4435 [default0]:Skipping sample id=2492315. Maximum sequence length: 2049, sample length: 4266 [default0]:Skipping sample id=2743201. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2725818. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2477489. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2724084. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2746216. Maximum sequence length: 2049, sample length: 3421 [default0]:Skipping sample id=2489378. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2757110. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2719014. Maximum sequence length: 2049, sample length: 4152 [default0]:Skipping sample id=2720618. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2739543. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2712058. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2721174. Maximum sequence length: 2049, sample length: 4827 [default0]:Skipping sample id=2466496. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2721905. Maximum sequence length: 2049, sample length: 2939 [default0]:Skipping sample id=2756451. Maximum sequence length: 2049, sample length: 2757 [default0]:Skipping sample id=2716360. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2477084. Maximum sequence length: 2049, sample length: 3590 [default0]:Skipping sample id=2741006. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2737360. Maximum sequence length: 2049, sample length: 3109 [default0]:Skipping sample id=2725775. Maximum sequence length: 2049, sample length: 5860 [default0]:Skipping sample id=2748311. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2720336. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2748798. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2725145. Maximum sequence length: 2049, sample length: 2904 [default0]:Skipping sample id=2714538. Maximum sequence length: 2049, sample length: 4775 [default0]:Skipping sample id=2741242. Maximum sequence length: 2049, sample length: 5159 [default0]:Skipping sample id=2742609. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2723763. Maximum sequence length: 2049, sample length: 3277 [default0]:Skipping sample id=2751463. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2721823. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2712185. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2727814. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2711962. Maximum sequence length: 2049, sample length: 3418 [default0]:Skipping sample id=2750034. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2754686. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2713950. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2732364. Maximum sequence length: 2049, sample length: 4371 [default0]:Skipping sample id=2749625. Maximum sequence length: 2049, sample length: 4447 [default0]:Skipping sample id=2751705. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2466073. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2724848. Maximum sequence length: 2049, sample length: 3457 [default0]:Skipping sample id=2714680. Maximum sequence length: 2049, sample length: 4005 [default0]:Skipping sample id=2711313. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2715484. Maximum sequence length: 2049, sample length: 5393 [default0]:Skipping sample id=2742245. Maximum sequence length: 2049, sample length: 2644 [default0]:Skipping sample id=2467794. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2484807. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2724301. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2756947. Maximum sequence length: 2049, sample length: 3832 [default0]:Skipping sample id=2487395. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2750354. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2713673. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2747969. Maximum sequence length: 2049, sample length: 6263 [default0]:Skipping sample id=2716294. Maximum sequence length: 2049, sample length: 5826 [default0]:Skipping sample id=2722073. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2739168. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2713708. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2728334. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2740346. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2736586. Maximum sequence length: 2049, sample length: 3256 [default0]:Skipping sample id=2735422. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2488579. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2481074. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2730041. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2713415. Maximum sequence length: 2049, sample length: 3235 [default0]:Skipping sample id=2733659. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2726447. Maximum sequence length: 2049, sample length: 4813 [default0]:Skipping sample id=2723865. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2750834. Maximum sequence length: 2049, sample length: 3268 [default0]:Skipping sample id=2489781. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2722592. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2724683. Maximum sequence length: 2049, sample length: 2983 [default0]:Skipping sample id=2749078. Maximum sequence length: 2049, sample length: 5543 [default0]:Skipping sample id=2728206. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2736440. Maximum sequence length: 2049, sample length: 3215 [default0]:Skipping sample id=2491674. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2746163. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2722143. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2746749. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2754187. Maximum sequence length: 2049, sample length: 6973 [default0]:Skipping sample id=2493267. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2725054. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2725295. Maximum sequence length: 2049, sample length: 3047 [default0]:Skipping sample id=2712505. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2729348. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2750557. Maximum sequence length: 2049, sample length: 2500 [default0]:Skipping sample id=2746296. Maximum sequence length: 2049, sample length: 3944 [default0]:Skipping sample id=2721974. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2742779. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2727353. Maximum sequence length: 2049, sample length: 2455 [default0]:Skipping sample id=2724197. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2718628. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2756994. Maximum sequence length: 2049, sample length: 3162 [default0]:Skipping sample id=2736495. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2735492. Maximum sequence length: 2049, sample length: 4625 [default0]:Skipping sample id=2749682. Maximum sequence length: 2049, sample length: 5029 [default0]:Skipping sample id=2716069. Maximum sequence length: 2049, sample length: 5057 [default0]:Skipping sample id=2713621. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2722906. Maximum sequence length: 2049, sample length: 2781 [default0]:Skipping sample id=2729683. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2751411. Maximum sequence length: 2049, sample length: 3042 [default0]:Skipping sample id=2756573. Maximum sequence length: 2049, sample length: 3097 [default0]:Skipping sample id=2742514. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2711788. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2742332. Maximum sequence length: 2049, sample length: 4540 [default0]:Skipping sample id=2479922. Maximum sequence length: 2049, sample length: 2941 [default0]:Skipping sample id=2733733. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2732325. Maximum sequence length: 2049, sample length: 3615 [default0]:Skipping sample id=2749428. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2741185. Maximum sequence length: 2049, sample length: 3889 [default0]:Skipping sample id=2716442. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2748698. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2717792. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2714171. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2749449. Maximum sequence length: 2049, sample length: 2850 [default0]:Skipping sample id=2737109. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2751376. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2712430. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2753793. Maximum sequence length: 2049, sample length: 3574 [default0]:Skipping sample id=2747766. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2717432. Maximum sequence length: 2049, sample length: 6956 [default0]:Skipping sample id=2752804. Maximum sequence length: 2049, sample length: 3752 [default0]:Skipping sample id=2746330. Maximum sequence length: 2049, sample length: 3341 [default0]:Skipping sample id=2495947. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2747718. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2718034. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2714185. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2746780. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2754146. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2747118. Maximum sequence length: 2049, sample length: 7155 [default0]:Skipping sample id=2754433. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2716621. Maximum sequence length: 2049, sample length: 4046 [default0]:Skipping sample id=2720460. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2499185. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2489901. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2742190. Maximum sequence length: 2049, sample length: 3603 [default0]:Skipping sample id=2718994. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2748791. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2743889. Maximum sequence length: 2049, sample length: 3510 [default0]:Skipping sample id=2736624. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2494570. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2739676. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2720668. Maximum sequence length: 2049, sample length: 3759 [default0]:Skipping sample id=2498983. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2722138. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2737396. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2732676. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2715958. Maximum sequence length: 2049, sample length: 3083 [default0]:Skipping sample id=2749249. Maximum sequence length: 2049, sample length: 4227 [default0]:Skipping sample id=2490029. Maximum sequence length: 2049, sample length: 2727 [default0]:Skipping sample id=2746663. Maximum sequence length: 2049, sample length: 3747 [default0]:Skipping sample id=2740281. Maximum sequence length: 2049, sample length: 3673 [default0]:Skipping sample id=2733184. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2740617. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2729825. Maximum sequence length: 2049, sample length: 3792 [default0]:Skipping sample id=2727579. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2739918. Maximum sequence length: 2049, sample length: 3034 [default0]:Skipping sample id=2736567. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2490211. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2485421. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2752259. Maximum sequence length: 2049, sample length: 3743 [default0]:Skipping sample id=2493833. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2741415. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2727250. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2730179. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2478830. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2719123. Maximum sequence length: 2049, sample length: 8506 [default0]:Skipping sample id=2729491. Maximum sequence length: 2049, sample length: 4562 [default0]:Skipping sample id=2731080. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2495894. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2478214. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2744602. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2720649. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2725843. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2754833. Maximum sequence length: 2049, sample length: 3321 [default0]:Skipping sample id=2717424. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2727609. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2752107. Maximum sequence length: 2049, sample length: 4551 [default0]:Skipping sample id=2721970. Maximum sequence length: 2049, sample length: 2557 [default0]:Skipping sample id=2721121. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2728994. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2751420. Maximum sequence length: 2049, sample length: 5630 [default0]:Skipping sample id=2489780. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2491056. Maximum sequence length: 2049, sample length: 3411 [default0]:Skipping sample id=2466499. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2489737. Maximum sequence length: 2049, sample length: 3267 [default0]:Skipping sample id=2752570. Maximum sequence length: 2049, sample length: 3427 [default0]:Skipping sample id=2728834. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2743751. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2733522. Maximum sequence length: 2049, sample length: 5032 [default0]:Skipping sample id=2497940. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2739513. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2745147. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2752998. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2712149. Maximum sequence length: 2049, sample length: 5291 [default0]:Skipping sample id=2727095. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2736934. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2756899. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2748669. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2731950. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2728670. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2744087. Maximum sequence length: 2049, sample length: 3490 [default0]:Skipping sample id=2726358. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2726192. Maximum sequence length: 2049, sample length: 5367 [default0]:Skipping sample id=2718120. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2724246. Maximum sequence length: 2049, sample length: 2771 [default0]:Skipping sample id=2728346. Maximum sequence length: 2049, sample length: 3735 [default0]:Skipping sample id=2728971. Maximum sequence length: 2049, sample length: 3437 [default0]:Skipping sample id=2744475. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2750600. Maximum sequence length: 2049, sample length: 4365 [default0]:Skipping sample id=2468730. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2721856. Maximum sequence length: 2049, sample length: 3486 [default0]:Skipping sample id=2721572. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2712964. Maximum sequence length: 2049, sample length: 3823 [default0]:Skipping sample id=2752386. Maximum sequence length: 2049, sample length: 3659 [default0]:Skipping sample id=2739808. Maximum sequence length: 2049, sample length: 2818 [default0]:Skipping sample id=2738012. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2756473. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2730378. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2746994. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2712860. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2728642. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2748979. Maximum sequence length: 2049, sample length: 4816 [default0]:Skipping sample id=2736069. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2741312. Maximum sequence length: 2049, sample length: 4116 [default0]:Skipping sample id=2746890. Maximum sequence length: 2049, sample length: 4122 [default0]:Skipping sample id=2719812. Maximum sequence length: 2049, sample length: 4209 [default0]:Skipping sample id=2744357. Maximum sequence length: 2049, sample length: 3004 [default0]:Skipping sample id=2734773. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2729138. Maximum sequence length: 2049, sample length: 5678 [default0]:Skipping sample id=2484264. Maximum sequence length: 2049, sample length: 2695 [default0]:Skipping sample id=2743095. Maximum sequence length: 2049, sample length: 6257 [default0]:Skipping sample id=2725847. Maximum sequence length: 2049, sample length: 4743 [default0]:Skipping sample id=2742150. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2733562. Maximum sequence length: 2049, sample length: 4077 [default0]:Skipping sample id=2720242. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2735921. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2756595. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2730666. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2495006. Maximum sequence length: 2049, sample length: 2687 [default0]:Skipping sample id=2756629. Maximum sequence length: 2049, sample length: 5165 [default0]:Skipping sample id=2485728. Maximum sequence length: 2049, sample length: 3241 [default0]:Skipping sample id=2745482. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2713505. Maximum sequence length: 2049, sample length: 3270 [default0]:Skipping sample id=2733853. Maximum sequence length: 2049, sample length: 5023 [default0]:Skipping sample id=2721105. Maximum sequence length: 2049, sample length: 2400 [default0]:Skipping sample id=2715634. Maximum sequence length: 2049, sample length: 4233 [default0]:Skipping sample id=2756278. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2753758. Maximum sequence length: 2049, sample length: 6106 [default0]:Skipping sample id=2733971. Maximum sequence length: 2049, sample length: 5707 [default0]:Skipping sample id=2752810. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2723925. Maximum sequence length: 2049, sample length: 2746 [default0]:Skipping sample id=2746523. Maximum sequence length: 2049, sample length: 3250 [default0]:Skipping sample id=2734887. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2722482. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2734955. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2712932. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2735208. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2752322. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2748867. Maximum sequence length: 2049, sample length: 3612 [default0]:Skipping sample id=2743956. Maximum sequence length: 2049, sample length: 3839 [default0]:Skipping sample id=2730508. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2738026. Maximum sequence length: 2049, sample length: 5173 [default0]:Skipping sample id=2721881. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2744107. Maximum sequence length: 2049, sample length: 3833 [default0]:Skipping sample id=2726287. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2731335. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2720959. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2757024. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2735735. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2735599. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2748862. Maximum sequence length: 2049, sample length: 3603 [default0]:Skipping sample id=2495413. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2498409. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2721620. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2734137. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2731041. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2731178. Maximum sequence length: 2049, sample length: 3215 [default0]:Skipping sample id=2724042. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2745833. Maximum sequence length: 2049, sample length: 4960 [default0]:Skipping sample id=2724249. Maximum sequence length: 2049, sample length: 3439 [default0]:Skipping sample id=2720369. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2711161. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2747305. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2754932. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2713941. Maximum sequence length: 2049, sample length: 3726 [default0]:Skipping sample id=2483073. Maximum sequence length: 2049, sample length: 3520 [default0]:Skipping sample id=2744881. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2710999. Maximum sequence length: 2049, sample length: 4371 [default0]:Skipping sample id=2499450. Maximum sequence length: 2049, sample length: 3118 [default0]:Skipping sample id=2481620. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2713124. Maximum sequence length: 2049, sample length: 4371 [default0]:Skipping sample id=2488741. Maximum sequence length: 2049, sample length: 3241 [default0]:Skipping sample id=2724102. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2734534. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2743960. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2756883. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2714250. Maximum sequence length: 2049, sample length: 2573 [default0]:Skipping sample id=2728226. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2755365. Maximum sequence length: 2049, sample length: 3239 [default0]:Skipping sample id=2740096. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2734766. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2715479. Maximum sequence length: 2049, sample length: 3465 [default0]:Skipping sample id=2734129. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2719187. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2741374. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2720589. Maximum sequence length: 2049, sample length: 4254 [default0]:Skipping sample id=2747192. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2493233. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2740089. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2749867. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2742513. Maximum sequence length: 2049, sample length: 3339 [default0]:Skipping sample id=2748383. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2732008. Maximum sequence length: 2049, sample length: 4036 [default0]:Skipping sample id=2739051. Maximum sequence length: 2049, sample length: 2843 [default0]:Skipping sample id=2711048. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2739120. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2738023. Maximum sequence length: 2049, sample length: 4414 [default0]:Skipping sample id=2732778. Maximum sequence length: 2049, sample length: 4251 [default0]:Skipping sample id=2727611. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2750542. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2731171. Maximum sequence length: 2049, sample length: 4486 [default0]:Skipping sample id=2752530. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2731301. Maximum sequence length: 2049, sample length: 3687 [default0]:Skipping sample id=2732701. Maximum sequence length: 2049, sample length: 3380 [default0]:Skipping sample id=2470774. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2712305. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2492787. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2711302. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2467185. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2715036. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2731082. Maximum sequence length: 2049, sample length: 3950 [default0]:Skipping sample id=2727602. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2747306. Maximum sequence length: 2049, sample length: 4371 [default0]:Skipping sample id=2478967. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2730669. Maximum sequence length: 2049, sample length: 2942 [default0]:Skipping sample id=2737907. Maximum sequence length: 2049, sample length: 4152 [default0]:Skipping sample id=2742922. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2498750. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2493335. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2731849. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2486566. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2488420. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2754142. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2477491. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2724351. Maximum sequence length: 2049, sample length: 3397 [default0]:Skipping sample id=2740651. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2721744. Maximum sequence length: 2049, sample length: 5194 [default0]:Skipping sample id=2744553. Maximum sequence length: 2049, sample length: 5021 [default0]:Skipping sample id=2756907. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2729668. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2750426. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2748298. Maximum sequence length: 2049, sample length: 3047 [default0]:Skipping sample id=2738769. Maximum sequence length: 2049, sample length: 4099 [default0]:Skipping sample id=2743771. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2734838. Maximum sequence length: 2049, sample length: 3067 [default0]:Skipping sample id=2751149. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2734504. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2723950. Maximum sequence length: 2049, sample length: 5132 [default0]:Skipping sample id=2738218. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2497390. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2489713. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2742322. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2734803. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2745930. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2711747. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2739177. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2722967. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2734010. Maximum sequence length: 2049, sample length: 3384 [default0]:Skipping sample id=2717040. Maximum sequence length: 2049, sample length: 4714 [default0]:Skipping sample id=2748272. Maximum sequence length: 2049, sample length: 3092 [default0]:Skipping sample id=2720723. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2736024. Maximum sequence length: 2049, sample length: 3120 [default0]:Skipping sample id=2484568. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2720571. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2755972. Maximum sequence length: 2049, sample length: 4337 [default0]:Skipping sample id=2735557. Maximum sequence length: 2049, sample length: 5221 [default0]:Skipping sample id=2749841. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2734644. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2751947. Maximum sequence length: 2049, sample length: 3345 [default0]:Skipping sample id=2732258. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2728077. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2470206. Maximum sequence length: 2049, sample length: 2925 [default0]:Skipping sample id=2728126. Maximum sequence length: 2049, sample length: 6292 [default0]:Skipping sample id=2487941. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2740809. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2712477. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2716456. Maximum sequence length: 2049, sample length: 2502 [default0]:Skipping sample id=2756585. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2720945. Maximum sequence length: 2049, sample length: 3151 [default0]:Skipping sample id=2723169. Maximum sequence length: 2049, sample length: 5828 [default0]:Skipping sample id=2748465. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2712575. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2715326. Maximum sequence length: 2049, sample length: 2989 [default0]:Skipping sample id=2731830. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2724850. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2747977. Maximum sequence length: 2049, sample length: 3618 [default0]:Skipping sample id=2477146. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2743842. Maximum sequence length: 2049, sample length: 3822 [default0]:Skipping sample id=2487645. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2721833. Maximum sequence length: 2049, sample length: 2983 [default0]:Skipping sample id=2725190. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2469831. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2723492. Maximum sequence length: 2049, sample length: 4872 [default0]:Skipping sample id=2714906. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2748971. Maximum sequence length: 2049, sample length: 3828 [default0]:Skipping sample id=2757102. Maximum sequence length: 2049, sample length: 3229 [default0]:Skipping sample id=2722417. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2713213. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2724751. Maximum sequence length: 2049, sample length: 2862 [default0]:Skipping sample id=2750589. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2480583. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2719078. Maximum sequence length: 2049, sample length: 6017 [default0]:Skipping sample id=2724275. Maximum sequence length: 2049, sample length: 3271 [default0]:Skipping sample id=2754579. Maximum sequence length: 2049, sample length: 3180 [default0]:Skipping sample id=2714658. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2738489. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2729181. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2731503. Maximum sequence length: 2049, sample length: 4310 [default0]:Skipping sample id=2741810. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2754996. Maximum sequence length: 2049, sample length: 2457 [default0]:Skipping sample id=2714708. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2755816. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2493497. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2736727. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2714876. Maximum sequence length: 2049, sample length: 4322 [default0]:Skipping sample id=2486899. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2756211. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2726945. Maximum sequence length: 2049, sample length: 4708 [default0]:Skipping sample id=2718681. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2728705. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2484703. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2736110. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2749170. Maximum sequence length: 2049, sample length: 4778 [default0]:Skipping sample id=2734154. Maximum sequence length: 2049, sample length: 2963 [default0]:Skipping sample id=2753982. Maximum sequence length: 2049, sample length: 5276 [default0]:Skipping sample id=2720608. Maximum sequence length: 2049, sample length: 3465 [default0]:Skipping sample id=2748052. Maximum sequence length: 2049, sample length: 5212 [default0]:Skipping sample id=2748078. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2719743. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2721571. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2741314. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2734604. Maximum sequence length: 2049, sample length: 2780 [default0]:Skipping sample id=2723976. Maximum sequence length: 2049, sample length: 4620 [default0]:Skipping sample id=2495994. Maximum sequence length: 2049, sample length: 3448 [default0]:Skipping sample id=2742687. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2746353. Maximum sequence length: 2049, sample length: 2923 [default0]:Skipping sample id=2716060. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2726486. Maximum sequence length: 2049, sample length: 3029 [default0]:Skipping sample id=2734507. Maximum sequence length: 2049, sample length: 3422 [default0]:Skipping sample id=2725641. Maximum sequence length: 2049, sample length: 3602 [default0]:Skipping sample id=2756425. Maximum sequence length: 2049, sample length: 2747 [default0]:Skipping sample id=2742151. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2719069. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2718957. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2718588. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2720405. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2721389. Maximum sequence length: 2049, sample length: 4885 [default0]:Skipping sample id=2744659. Maximum sequence length: 2049, sample length: 4011 [default0]:Skipping sample id=2737830. Maximum sequence length: 2049, sample length: 3150 [default0]:Skipping sample id=2732457. Maximum sequence length: 2049, sample length: 5310 [default0]:Skipping sample id=2713775. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2720166. Maximum sequence length: 2049, sample length: 3040 [default0]:Skipping sample id=2749276. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2718325. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2740530. Maximum sequence length: 2049, sample length: 3034 [default0]:Skipping sample id=2737051. Maximum sequence length: 2049, sample length: 3964 [default0]:Skipping sample id=2733698. Maximum sequence length: 2049, sample length: 4020 [default0]:Skipping sample id=2726291. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2753929. Maximum sequence length: 2049, sample length: 4188 [default0]:Skipping sample id=2722450. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2495294. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2749236. Maximum sequence length: 2049, sample length: 3839 [default0]:Skipping sample id=2745737. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2740355. Maximum sequence length: 2049, sample length: 5312 [default0]:Skipping sample id=2741413. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2717008. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2749777. Maximum sequence length: 2049, sample length: 4451 [default0]:Skipping sample id=2726258. Maximum sequence length: 2049, sample length: 3993 [default0]:Skipping sample id=2732300. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2735225. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2730532. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2484156. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2734099. Maximum sequence length: 2049, sample length: 3100 [default0]:Skipping sample id=2710982. Maximum sequence length: 2049, sample length: 3403 [default0]:Skipping sample id=2748927. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2729284. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2738743. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2731248. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2725531. Maximum sequence length: 2049, sample length: 3312 [default0]:Skipping sample id=2495831. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2735929. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2718521. Maximum sequence length: 2049, sample length: 3465 [default0]:Skipping sample id=2729054. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2745428. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2711401. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2479283. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2496361. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2726701. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2737842. Maximum sequence length: 2049, sample length: 3926 [default0]:Skipping sample id=2741993. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2715747. Maximum sequence length: 2049, sample length: 3119 [default0]:Skipping sample id=2715943. Maximum sequence length: 2049, sample length: 6524 [default0]:Skipping sample id=2729478. Maximum sequence length: 2049, sample length: 5303 [default0]:Skipping sample id=2716622. Maximum sequence length: 2049, sample length: 4278 [default0]:Skipping sample id=2755313. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2720114. Maximum sequence length: 2049, sample length: 3447 [default0]:Skipping sample id=2715579. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2712745. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2754113. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2739769. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2744179. Maximum sequence length: 2049, sample length: 3437 [default0]:Skipping sample id=2748188. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2746702. Maximum sequence length: 2049, sample length: 4385 [default0]:Skipping sample id=2733281. Maximum sequence length: 2049, sample length: 3627 [default0]:Skipping sample id=2716214. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2726351. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2750736. Maximum sequence length: 2049, sample length: 2406 [default0]:Skipping sample id=2720880. Maximum sequence length: 2049, sample length: 4017 [default0]:Skipping sample id=2717989. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2746844. Maximum sequence length: 2049, sample length: 3223 [default0]:Skipping sample id=2735831. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2753611. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2737565. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2729704. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2744765. Maximum sequence length: 2049, sample length: 3348 [default0]:Skipping sample id=2721326. Maximum sequence length: 2049, sample length: 4048 [default0]:Skipping sample id=2729448. Maximum sequence length: 2049, sample length: 4026 [default0]:Skipping sample id=2728169. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2736293. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2719190. Maximum sequence length: 2049, sample length: 4036 [default0]:Skipping sample id=2736645. Maximum sequence length: 2049, sample length: 2649 [default0]:Skipping sample id=2728245. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2749891. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2724866. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2713315. Maximum sequence length: 2049, sample length: 3164 [default0]:Skipping sample id=2748393. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2752740. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2716179. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2746459. Maximum sequence length: 2049, sample length: 6153 [default0]:Skipping sample id=2748639. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2740085. Maximum sequence length: 2049, sample length: 6665 [default0]:Skipping sample id=2488862. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2751425. Maximum sequence length: 2049, sample length: 3950 [default0]:Skipping sample id=2718183. Maximum sequence length: 2049, sample length: 3298 [default0]:Skipping sample id=2727844. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2730256. Maximum sequence length: 2049, sample length: 3665 [default0]:Skipping sample id=2731747. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2752086. Maximum sequence length: 2049, sample length: 3080 [default0]:Skipping sample id=2470547. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2712321. Maximum sequence length: 2049, sample length: 6492 [default0]:Skipping sample id=2728911. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2721551. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2722097. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2727408. Maximum sequence length: 2049, sample length: 3699 [default0]:Skipping sample id=2726713. Maximum sequence length: 2049, sample length: 2714 [default0]:Skipping sample id=2737269. Maximum sequence length: 2049, sample length: 3430 [default0]:Skipping sample id=2495538. Maximum sequence length: 2049, sample length: 2369 [default0]:Skipping sample id=2731341. Maximum sequence length: 2049, sample length: 2907 [default0]:Skipping sample id=2718722. Maximum sequence length: 2049, sample length: 2747 [default0]:Skipping sample id=2751399. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2711727. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2714891. Maximum sequence length: 2049, sample length: 4779 [default0]:Skipping sample id=2714181. Maximum sequence length: 2049, sample length: 5974 [default0]:Skipping sample id=2721077. Maximum sequence length: 2049, sample length: 2569 [default0]:Skipping sample id=2740822. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2735010. Maximum sequence length: 2049, sample length: 4143 [default0]:Skipping sample id=2752408. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2721722. Maximum sequence length: 2049, sample length: 2903 [default0]:Skipping sample id=2738815. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2719390. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2468100. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2739766. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2496698. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2754639. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2712230. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2745947. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2479099. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2749993. Maximum sequence length: 2049, sample length: 3383 [default0]:Skipping sample id=2469942. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2730866. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2742893. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2737524. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2486669. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2718825. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2737001. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2715113. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2713570. Maximum sequence length: 2049, sample length: 3978 [default0]:Skipping sample id=2741010. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2746334. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2722727. Maximum sequence length: 2049, sample length: 4779 [default0]:Skipping sample id=2744639. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2749816. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2747279. Maximum sequence length: 2049, sample length: 2687 [default0]:Skipping sample id=2721276. Maximum sequence length: 2049, sample length: 5020 [default0]:Skipping sample id=2724003. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2482117. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2722023. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2750145. Maximum sequence length: 2049, sample length: 2915 [default0]:Skipping sample id=2750318. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2711278. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2495512. Maximum sequence length: 2049, sample length: 2470 [default0]:Skipping sample id=2754257. Maximum sequence length: 2049, sample length: 4793 [default0]:Skipping sample id=2755789. Maximum sequence length: 2049, sample length: 3724 [default0]:Skipping sample id=2729189. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2737008. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2495047. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2496982. Maximum sequence length: 2049, sample length: 3333 [default0]:Skipping sample id=2742041. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2717757. Maximum sequence length: 2049, sample length: 5357 [default0]:Skipping sample id=2726959. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2747735. Maximum sequence length: 2049, sample length: 5465 [default0]:Skipping sample id=2726885. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2483466. Maximum sequence length: 2049, sample length: 3198 [default0]:Skipping sample id=2727713. Maximum sequence length: 2049, sample length: 3193 [default0]:Skipping sample id=2756363. Maximum sequence length: 2049, sample length: 3487 [default0]:Skipping sample id=2486278. Maximum sequence length: 2049, sample length: 3146 [default0]:Skipping sample id=2713872. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2748788. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2717823. Maximum sequence length: 2049, sample length: 4252 [default0]:Skipping sample id=2721534. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2720014. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2717859. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2752345. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2467842. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2756898. Maximum sequence length: 2049, sample length: 3059 [default0]:Skipping sample id=2715057. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2479185. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2755472. Maximum sequence length: 2049, sample length: 6671 [default0]:Skipping sample id=2740921. Maximum sequence length: 2049, sample length: 2954 [default0]:Skipping sample id=2731265. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2722442. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2480944. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2739639. Maximum sequence length: 2049, sample length: 8471 [default0]:Skipping sample id=2723997. Maximum sequence length: 2049, sample length: 4134 [default0]:Skipping sample id=2754249. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2751315. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2728182. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2721549. Maximum sequence length: 2049, sample length: 3287 [default0]:Skipping sample id=2737423. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2717343. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2727545. Maximum sequence length: 2049, sample length: 3796 [default0]:Skipping sample id=2725647. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2749778. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2737987. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2725240. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2752116. Maximum sequence length: 2049, sample length: 3970 [default0]:Skipping sample id=2478000. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2471279. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2712097. Maximum sequence length: 2049, sample length: 4068 [default0]:Skipping sample id=2756209. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2720455. Maximum sequence length: 2049, sample length: 2989 [default0]:Skipping sample id=2711218. Maximum sequence length: 2049, sample length: 2953 [default0]:Skipping sample id=2750236. Maximum sequence length: 2049, sample length: 3241 [default0]:Skipping sample id=2745368. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2733273. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2711990. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2733109. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2720003. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2743387. Maximum sequence length: 2049, sample length: 4342 [default0]:Skipping sample id=2745639. Maximum sequence length: 2049, sample length: 3681 [default0]:Skipping sample id=2722855. Maximum sequence length: 2049, sample length: 3235 [default0]:Skipping sample id=2491301. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2712839. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2753545. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2739211. Maximum sequence length: 2049, sample length: 2929 [default0]:Skipping sample id=2713828. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2741302. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2741860. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2716366. Maximum sequence length: 2049, sample length: 3297 [default0]:Skipping sample id=2725148. Maximum sequence length: 2049, sample length: 3100 [default0]:Skipping sample id=2490107. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2721818. Maximum sequence length: 2049, sample length: 4974 [default0]:Skipping sample id=2721935. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2736245. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2729450. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2479516. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2731589. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2744708. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2731688. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2729000. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2490050. Maximum sequence length: 2049, sample length: 2507 [default0]:Skipping sample id=2752924. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2711241. Maximum sequence length: 2049, sample length: 3310 [default0]:Skipping sample id=2720349. Maximum sequence length: 2049, sample length: 5964 [default0]:Skipping sample id=2733819. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2712463. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2482768. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2751912. Maximum sequence length: 2049, sample length: 4760 [default0]:Skipping sample id=2719219. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2714194. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2730774. Maximum sequence length: 2049, sample length: 3438 [default0]:Skipping sample id=2713540. Maximum sequence length: 2049, sample length: 2992 [default0]:Skipping sample id=2743426. Maximum sequence length: 2049, sample length: 4657 [default0]:Skipping sample id=2736022. Maximum sequence length: 2049, sample length: 3152 [default0]:Skipping sample id=2731742. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2731277. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2484448. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2751537. Maximum sequence length: 2049, sample length: 4127 [default0]:Skipping sample id=2746638. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2756251. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2495790. Maximum sequence length: 2049, sample length: 3408 [default0]:Skipping sample id=2747139. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2729582. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2491815. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2484847. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2725600. Maximum sequence length: 2049, sample length: 2455 [default0]:Skipping sample id=2755662. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2714466. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2714988. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2754369. Maximum sequence length: 2049, sample length: 4749 [default0]:Skipping sample id=2489600. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2498643. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2733235. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2732470. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2740019. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2746443. Maximum sequence length: 2049, sample length: 3140 [default0]:Skipping sample id=2741551. Maximum sequence length: 2049, sample length: 3375 [default0]:Skipping sample id=2756979. Maximum sequence length: 2049, sample length: 4212 [default0]:Skipping sample id=2735731. Maximum sequence length: 2049, sample length: 3424 [default0]:Skipping sample id=2716598. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2715245. Maximum sequence length: 2049, sample length: 3809 [default0]:Skipping sample id=2742562. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2752912. Maximum sequence length: 2049, sample length: 3510 [default0]:Skipping sample id=2711575. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2496316. Maximum sequence length: 2049, sample length: 4325 [default0]:Skipping sample id=2743764. Maximum sequence length: 2049, sample length: 2963 [default0]:Skipping sample id=2730175. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2740725. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2711231. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2753741. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2485072. Maximum sequence length: 2049, sample length: 4328 [default0]:Skipping sample id=2727573. Maximum sequence length: 2049, sample length: 7095 [default0]:Skipping sample id=2712160. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2715055. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2749060. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2713253. Maximum sequence length: 2049, sample length: 3006 [default0]:Skipping sample id=2716513. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2715964. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2742222. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2753401. Maximum sequence length: 2049, sample length: 3090 [default0]:Skipping sample id=2728913. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2740990. Maximum sequence length: 2049, sample length: 2879 [default0]:Skipping sample id=2466921. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2746732. Maximum sequence length: 2049, sample length: 2841 [default0]:Skipping sample id=2721728. Maximum sequence length: 2049, sample length: 4245 [default0]:Skipping sample id=2711091. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2489273. Maximum sequence length: 2049, sample length: 4083 [default0]:Skipping sample id=2752092. Maximum sequence length: 2049, sample length: 4648 [default0]:Skipping sample id=2742194. Maximum sequence length: 2049, sample length: 3308 [default0]:Skipping sample id=2728647. Maximum sequence length: 2049, sample length: 3572 [default0]:Skipping sample id=2721513. Maximum sequence length: 2049, sample length: 5093 [default0]:Skipping sample id=2737652. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2746360. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2739705. Maximum sequence length: 2049, sample length: 2972 [default0]:Skipping sample id=2755923. Maximum sequence length: 2049, sample length: 4579 [default0]:Skipping sample id=2751116. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2752784. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2730852. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2741213. Maximum sequence length: 2049, sample length: 4042 [default0]:Skipping sample id=2745659. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2753512. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2752316. Maximum sequence length: 2049, sample length: 4913 [default0]:Skipping sample id=2752590. Maximum sequence length: 2049, sample length: 4546 [default0]:Skipping sample id=2734413. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2732154. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2711839. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2714622. Maximum sequence length: 2049, sample length: 4533 [default0]:Skipping sample id=2756351. Maximum sequence length: 2049, sample length: 2234 [default0]:Skipping sample id=2753389. Maximum sequence length: 2049, sample length: 3124 [default0]:Skipping sample id=2746314. Maximum sequence length: 2049, sample length: 3890 [default0]:Skipping sample id=2733137. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2718119. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2717534. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2732821. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2756097. Maximum sequence length: 2049, sample length: 3720 [default0]:Skipping sample id=2725155. Maximum sequence length: 2049, sample length: 4162 [default0]:Skipping sample id=2743355. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2733444. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2739734. Maximum sequence length: 2049, sample length: 5643 [default0]:Skipping sample id=2722984. Maximum sequence length: 2049, sample length: 6235 [default0]:Skipping sample id=2734908. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2467499. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2711235. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2754821. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2731564. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2745708. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2752022. Maximum sequence length: 2049, sample length: 4235 [default0]:Skipping sample id=2755349. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2757077. Maximum sequence length: 2049, sample length: 4695 [default0]:Skipping sample id=2746371. Maximum sequence length: 2049, sample length: 5160 [default0]:Skipping sample id=2746327. Maximum sequence length: 2049, sample length: 3374 [default0]:Skipping sample id=2716260. Maximum sequence length: 2049, sample length: 4878 [default0]:Skipping sample id=2479812. Maximum sequence length: 2049, sample length: 3415 [default0]:Skipping sample id=2737673. Maximum sequence length: 2049, sample length: 3056 [default0]:Skipping sample id=2717935. Maximum sequence length: 2049, sample length: 4122 [default0]:Skipping sample id=2742533. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2714642. Maximum sequence length: 2049, sample length: 3856 [default0]:Skipping sample id=2744043. Maximum sequence length: 2049, sample length: 3277 [default0]:Skipping sample id=2750068. Maximum sequence length: 2049, sample length: 5174 [default0]:Skipping sample id=2722131. Maximum sequence length: 2049, sample length: 3027 [default0]:Skipping sample id=2469749. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2491173. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2726647. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2498572. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2744115. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2738808. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2736201. Maximum sequence length: 2049, sample length: 6335 [default0]:Skipping sample id=2721740. Maximum sequence length: 2049, sample length: 3469 [default0]:Skipping sample id=2754227. Maximum sequence length: 2049, sample length: 4508 [default0]:Skipping sample id=2719759. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2735553. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2723419. Maximum sequence length: 2049, sample length: 3685 [default0]:Skipping sample id=2495535. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2718676. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2732098. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2722716. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2713226. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2490194. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2735865. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2736516. Maximum sequence length: 2049, sample length: 3976 [default0]:Skipping sample id=2752272. Maximum sequence length: 2049, sample length: 2766 [default0]:Skipping sample id=2719387. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2716501. Maximum sequence length: 2049, sample length: 4903 [default0]:Skipping sample id=2734170. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2732663. Maximum sequence length: 2049, sample length: 3599 [default0]:Skipping sample id=2714088. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2725735. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2741389. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2719260. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2711341. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2732803. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2714681. Maximum sequence length: 2049, sample length: 3696 [default0]:Skipping sample id=2747351. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2745299. Maximum sequence length: 2049, sample length: 3763 [default0]:Skipping sample id=2715777. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2735146. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2742191. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2744792. Maximum sequence length: 2049, sample length: 3063 [default0]:Skipping sample id=2755556. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2757066. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2477364. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2724752. Maximum sequence length: 2049, sample length: 4160 [default0]:Skipping sample id=2732360. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2485660. Maximum sequence length: 2049, sample length: 2639 [default0]:Skipping sample id=2738406. Maximum sequence length: 2049, sample length: 3568 [default0]:Skipping sample id=2713329. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2719370. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2745562. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2753151. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2712490. Maximum sequence length: 2049, sample length: 3574 [default0]:Skipping sample id=2731602. Maximum sequence length: 2049, sample length: 3568 [default0]:Skipping sample id=2720217. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2756014. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2746840. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2717886. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2725243. Maximum sequence length: 2049, sample length: 3454 [default0]:Skipping sample id=2752724. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2721248. Maximum sequence length: 2049, sample length: 4352 [default0]:Skipping sample id=2498523. Maximum sequence length: 2049, sample length: 3032 [default0]:Skipping sample id=2494684. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2751626. Maximum sequence length: 2049, sample length: 8039 [default0]:Skipping sample id=2744412. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2726944. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2749254. Maximum sequence length: 2049, sample length: 3584 [default0]:Skipping sample id=2728520. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2720681. Maximum sequence length: 2049, sample length: 2974 [default0]:Skipping sample id=2720822. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2752474. Maximum sequence length: 2049, sample length: 2941 [default0]:Skipping sample id=2756600. Maximum sequence length: 2049, sample length: 4048 [default0]:Skipping sample id=2721232. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2722763. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2482873. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2733089. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2714748. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2739542. Maximum sequence length: 2049, sample length: 2490 [default0]:Skipping sample id=2749047. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2728030. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2746999. Maximum sequence length: 2049, sample length: 3260 [default0]:Skipping sample id=2482589. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2744552. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2487869. Maximum sequence length: 2049, sample length: 2515 [default0]:Skipping sample id=2714138. Maximum sequence length: 2049, sample length: 2676 [default0]:Skipping sample id=2752820. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2743532. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2725338. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2731538. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2733530. Maximum sequence length: 2049, sample length: 3398 [default0]:Skipping sample id=2749351. Maximum sequence length: 2049, sample length: 3118 [default0]:Skipping sample id=2727467. Maximum sequence length: 2049, sample length: 3228 [default0]:Skipping sample id=2718629. Maximum sequence length: 2049, sample length: 4245 [default0]:Skipping sample id=2734234. Maximum sequence length: 2049, sample length: 4603 [default0]:Skipping sample id=2745769. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2498957. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2712797. Maximum sequence length: 2049, sample length: 4907 [default0]:Skipping sample id=2470377. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2712443. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2712203. Maximum sequence length: 2049, sample length: 5943 [default0]:Skipping sample id=2721432. Maximum sequence length: 2049, sample length: 3634 [default0]:Skipping sample id=2721065. Maximum sequence length: 2049, sample length: 4562 [default0]:Skipping sample id=2744863. Maximum sequence length: 2049, sample length: 5644 [default0]:Skipping sample id=2743504. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2715960. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2742346. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2722069. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2725685. Maximum sequence length: 2049, sample length: 4455 [default0]:Skipping sample id=2744796. Maximum sequence length: 2049, sample length: 3454 [default0]:Skipping sample id=2754273. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2735401. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2487688. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2719062. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2715825. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2723328. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2745055. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2753874. Maximum sequence length: 2049, sample length: 4701 [default0]:Skipping sample id=2743149. Maximum sequence length: 2049, sample length: 3793 [default0]:Skipping sample id=2494696. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2717212. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2747541. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2713499. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2722270. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2736033. Maximum sequence length: 2049, sample length: 5843 [default0]:Skipping sample id=2752819. Maximum sequence length: 2049, sample length: 6765 [default0]:Skipping sample id=2715787. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2719196. Maximum sequence length: 2049, sample length: 2748 [default0]:Skipping sample id=2741105. Maximum sequence length: 2049, sample length: 4081 [default0]:Skipping sample id=2734300. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2756670. Maximum sequence length: 2049, sample length: 2545 [default0]:Skipping sample id=2499141. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2724407. Maximum sequence length: 2049, sample length: 3952 [default0]:Skipping sample id=2720000. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2722265. Maximum sequence length: 2049, sample length: 2703 [default0]:Skipping sample id=2481449. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2734204. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2728951. Maximum sequence length: 2049, sample length: 4191 [default0]:Skipping sample id=2712474. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2746909. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2730629. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2713492. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2724280. Maximum sequence length: 2049, sample length: 3613 [default0]:Skipping sample id=2724208. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2731588. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2495205. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2712224. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2713043. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2488234. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2723013. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2728581. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2730029. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2481465. Maximum sequence length: 2049, sample length: 2777 [default0]:Skipping sample id=2741777. Maximum sequence length: 2049, sample length: 3553 [default0]:Skipping sample id=2719242. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2749163. Maximum sequence length: 2049, sample length: 3366 [default0]:Skipping sample id=2756649. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2721157. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2495965. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2727744. Maximum sequence length: 2049, sample length: 3226 [default0]:Skipping sample id=2748648. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2713882. Maximum sequence length: 2049, sample length: 4670 [default0]:Skipping sample id=2732731. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2735595. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2736052. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2721330. Maximum sequence length: 2049, sample length: 2935 [default0]:Skipping sample id=2467289. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2723912. Maximum sequence length: 2049, sample length: 2455 [default0]:Skipping sample id=2716176. Maximum sequence length: 2049, sample length: 2849 [default0]:Skipping sample id=2734323. Maximum sequence length: 2049, sample length: 4284 [default0]:Skipping sample id=2739185. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2731915. Maximum sequence length: 2049, sample length: 3276 [default0]:Skipping sample id=2741356. Maximum sequence length: 2049, sample length: 3936 [default0]:Skipping sample id=2756296. Maximum sequence length: 2049, sample length: 3056 [default0]:Skipping sample id=2499125. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2490110. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2740924. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2716066. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2719752. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2740633. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2725608. Maximum sequence length: 2049, sample length: 4539 [default0]:Skipping sample id=2727626. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2726503. Maximum sequence length: 2049, sample length: 3977 [default0]:Skipping sample id=2711106. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2733196. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2724698. Maximum sequence length: 2049, sample length: 6870 [default0]:Skipping sample id=2722332. Maximum sequence length: 2049, sample length: 3757 [default0]:Skipping sample id=2744151. Maximum sequence length: 2049, sample length: 3910 [default0]:Skipping sample id=2756630. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2740331. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2731367. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2743336. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2711102. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2716618. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2720184. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2746599. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2752128. Maximum sequence length: 2049, sample length: 3259 [default0]:Skipping sample id=2485175. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2731095. Maximum sequence length: 2049, sample length: 3129 [default0]:Skipping sample id=2713725. Maximum sequence length: 2049, sample length: 3687 [default0]:Skipping sample id=2747520. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2727062. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2734014. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2724754. Maximum sequence length: 2049, sample length: 3385 [default0]:Skipping sample id=2725980. Maximum sequence length: 2049, sample length: 2565 [default0]:Skipping sample id=2735161. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2492737. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2732082. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2720510. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2730627. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2740891. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2727231. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2742653. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2714161. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2717150. Maximum sequence length: 2049, sample length: 3709 [default0]:Skipping sample id=2744346. Maximum sequence length: 2049, sample length: 3313 [default0]:Skipping sample id=2725264. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2745457. Maximum sequence length: 2049, sample length: 4015 [default0]:Skipping sample id=2714985. Maximum sequence length: 2049, sample length: 3447 [default0]:Skipping sample id=2486924. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2716990. Maximum sequence length: 2049, sample length: 3160 [default0]:Skipping sample id=2735331. Maximum sequence length: 2049, sample length: 3026 [default0]:Skipping sample id=2717591. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2722437. Maximum sequence length: 2049, sample length: 2798 [default0]:Skipping sample id=2494991. Maximum sequence length: 2049, sample length: 3009 [default0]:Skipping sample id=2724781. Maximum sequence length: 2049, sample length: 2934 [default0]:Skipping sample id=2725204. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2752849. Maximum sequence length: 2049, sample length: 3751 [default0]:Skipping sample id=2713547. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2727936. Maximum sequence length: 2049, sample length: 4754 [default0]:Skipping sample id=2498063. Maximum sequence length: 2049, sample length: 2683 [default0]:Skipping sample id=2714403. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2750658. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2722796. Maximum sequence length: 2049, sample length: 2876 [default0]:Skipping sample id=2742021. Maximum sequence length: 2049, sample length: 4522 [default0]:Skipping sample id=2726823. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2731333. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2489133. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2717815. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2744793. Maximum sequence length: 2049, sample length: 3756 [default0]:Skipping sample id=2744089. Maximum sequence length: 2049, sample length: 3001 [default0]:Skipping sample id=2723264. Maximum sequence length: 2049, sample length: 5049 [default0]:Skipping sample id=2729749. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2722561. Maximum sequence length: 2049, sample length: 6963 [default0]:Skipping sample id=2725045. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2755970. Maximum sequence length: 2049, sample length: 2677 [default0]:Skipping sample id=2485220. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2734889. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2726749. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2477960. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2711314. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2741605. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2716287. Maximum sequence length: 2049, sample length: 4910 [default0]:Skipping sample id=2495119. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2734145. Maximum sequence length: 2049, sample length: 4492 [default0]:Skipping sample id=2717414. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2486901. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2756713. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2735226. Maximum sequence length: 2049, sample length: 5512 [default0]:Skipping sample id=2486966. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2735789. Maximum sequence length: 2049, sample length: 3295 [default0]:Skipping sample id=2712784. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2746037. Maximum sequence length: 2049, sample length: 3518 [default0]:Skipping sample id=2711116. Maximum sequence length: 2049, sample length: 3987 [default0]:Skipping sample id=2732760. Maximum sequence length: 2049, sample length: 2988 [default0]:Skipping sample id=2743248. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2750665. Maximum sequence length: 2049, sample length: 2803 [default0]:Skipping sample id=2730883. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2736442. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2732159. Maximum sequence length: 2049, sample length: 4158 [default0]:Skipping sample id=2712001. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2743671. Maximum sequence length: 2049, sample length: 4182 [default0]:Skipping sample id=2739840. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2727861. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2756067. Maximum sequence length: 2049, sample length: 3123 [default0]:Skipping sample id=2747302. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2741549. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2715985. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2738048. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2727580. Maximum sequence length: 2049, sample length: 2846 [default0]:Skipping sample id=2737279. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2714029. Maximum sequence length: 2049, sample length: 3304 [default0]:Skipping sample id=2751850. Maximum sequence length: 2049, sample length: 4213 [default0]:Skipping sample id=2725584. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2740299. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2739034. Maximum sequence length: 2049, sample length: 3378 [default0]:Skipping sample id=2754817. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2746667. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2729367. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2754228. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2465761. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2757008. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2743500. Maximum sequence length: 2049, sample length: 6101 [default0]:Skipping sample id=2483007. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2729629. Maximum sequence length: 2049, sample length: 2959 [default0]:Skipping sample id=2711333. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2746625. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2751837. Maximum sequence length: 2049, sample length: 5241 [default0]:Skipping sample id=2724709. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2755528. Maximum sequence length: 2049, sample length: 3273 [default0]:Skipping sample id=2755752. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2715921. Maximum sequence length: 2049, sample length: 2919 [default0]:Skipping sample id=2749001. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2744175. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2743752. Maximum sequence length: 2049, sample length: 3804 [default0]:Skipping sample id=2750077. Maximum sequence length: 2049, sample length: 3434 [default0]:Skipping sample id=2723227. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2743855. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2715902. Maximum sequence length: 2049, sample length: 3583 [default0]:Skipping sample id=2741481. Maximum sequence length: 2049, sample length: 3538 [default0]:Skipping sample id=2736790. Maximum sequence length: 2049, sample length: 5045 [default0]:Skipping sample id=2748416. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2734016. Maximum sequence length: 2049, sample length: 4235 [default0]:Skipping sample id=2753790. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2498759. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2731316. Maximum sequence length: 2049, sample length: 4978 [default0]:Skipping sample id=2712738. Maximum sequence length: 2049, sample length: 3073 [default0]:Skipping sample id=2717527. Maximum sequence length: 2049, sample length: 3263 [default0]:Skipping sample id=2754757. Maximum sequence length: 2049, sample length: 3644 [default0]:Skipping sample id=2720281. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2752502. Maximum sequence length: 2049, sample length: 4070 [default0]:Skipping sample id=2754929. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2715342. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2721477. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2749963. Maximum sequence length: 2049, sample length: 2982 [default0]:Skipping sample id=2735573. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2744276. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2730154. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2740027. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2723623. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2751332. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2746168. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2488095. Maximum sequence length: 2049, sample length: 3105 [default0]:Skipping sample id=2725203. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2738642. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2492777. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2753860. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2723040. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2750801. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2751598. Maximum sequence length: 2049, sample length: 3706 [default0]:Skipping sample id=2755082. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2491360. Maximum sequence length: 2049, sample length: 3327 [default0]:Skipping sample id=2722839. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2741162. Maximum sequence length: 2049, sample length: 5621 [default0]:Skipping sample id=2711923. Maximum sequence length: 2049, sample length: 4099 [default0]:Skipping sample id=2729171. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2489307. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2756456. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2750555. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2469866. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2717325. Maximum sequence length: 2049, sample length: 3071 [default0]:Skipping sample id=2734342. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2722169. Maximum sequence length: 2049, sample length: 4721 [default0]:Skipping sample id=2741787. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2714526. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2721724. Maximum sequence length: 2049, sample length: 5796 [default0]:Skipping sample id=2723980. Maximum sequence length: 2049, sample length: 4708 [default0]:Skipping sample id=2725335. Maximum sequence length: 2049, sample length: 7792 [default0]:Skipping sample id=2715052. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2739459. Maximum sequence length: 2049, sample length: 2623 [default0]:Skipping sample id=2751880. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2727535. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2726602. Maximum sequence length: 2049, sample length: 7319 [default0]:Skipping sample id=2728887. Maximum sequence length: 2049, sample length: 4235 [default0]:Skipping sample id=2753537. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2742144. Maximum sequence length: 2049, sample length: 3005 [default0]:Skipping sample id=2720715. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2492763. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2744518. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2714276. Maximum sequence length: 2049, sample length: 3268 [default0]:Skipping sample id=2747931. Maximum sequence length: 2049, sample length: 3063 [default0]:Skipping sample id=2743353. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2719070. Maximum sequence length: 2049, sample length: 3726 [default0]:Skipping sample id=2756100. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2745854. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2752027. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2743442. Maximum sequence length: 2049, sample length: 3631 [default0]:Skipping sample id=2723983. Maximum sequence length: 2049, sample length: 3694 [default0]:Skipping sample id=2721406. Maximum sequence length: 2049, sample length: 2737 [default0]:Skipping sample id=2744662. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2719134. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2727923. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2725433. Maximum sequence length: 2049, sample length: 3067 [default0]:Skipping sample id=2719425. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2740971. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2735094. Maximum sequence length: 2049, sample length: 3767 [default0]:Skipping sample id=2755518. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2753335. Maximum sequence length: 2049, sample length: 6766 [default0]:Skipping sample id=2753931. Maximum sequence length: 2049, sample length: 4204 [default0]:Skipping sample id=2716804. Maximum sequence length: 2049, sample length: 2642 [default0]:Skipping sample id=2750604. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2720094. Maximum sequence length: 2049, sample length: 2938 [default0]:Skipping sample id=2724320. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2713747. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2735638. Maximum sequence length: 2049, sample length: 3515 [default0]:Skipping sample id=2752431. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2716841. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2756355. Maximum sequence length: 2049, sample length: 5492 [default0]:Skipping sample id=2481480. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2751335. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2750875. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2732462. Maximum sequence length: 2049, sample length: 4834 [default0]:Skipping sample id=2488735. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2743343. Maximum sequence length: 2049, sample length: 3373 [default0]:Skipping sample id=2465903. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2756002. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2741890. Maximum sequence length: 2049, sample length: 3828 [default0]:Skipping sample id=2717862. Maximum sequence length: 2049, sample length: 3185 [default0]:Skipping sample id=2750548. Maximum sequence length: 2049, sample length: 2562 [default0]:Skipping sample id=2745245. Maximum sequence length: 2049, sample length: 3329 [default0]:Skipping sample id=2745546. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2493769. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2724884. Maximum sequence length: 2049, sample length: 3195 [default0]:Skipping sample id=2736628. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2466971. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2749996. Maximum sequence length: 2049, sample length: 3198 [default0]:Skipping sample id=2743999. Maximum sequence length: 2049, sample length: 4788 [default0]:Skipping sample id=2723896. Maximum sequence length: 2049, sample length: 2980 [default0]:Skipping sample id=2734082. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2723572. Maximum sequence length: 2049, sample length: 2640 [default0]:Skipping sample id=2724942. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2732348. Maximum sequence length: 2049, sample length: 5634 [default0]:Skipping sample id=2739819. Maximum sequence length: 2049, sample length: 3427 [default0]:Skipping sample id=2719423. Maximum sequence length: 2049, sample length: 4329 [default0]:Skipping sample id=2491328. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2713335. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2722806. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2730291. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2751690. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2731462. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2498331. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2728804. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2748100. Maximum sequence length: 2049, sample length: 3933 [default0]:Skipping sample id=2743429. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2744324. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2715465. Maximum sequence length: 2049, sample length: 3381 [default0]:Skipping sample id=2753382. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2718964. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2715394. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2744368. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2739339. Maximum sequence length: 2049, sample length: 2906 [default0]:Skipping sample id=2723186. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2713181. Maximum sequence length: 2049, sample length: 4416 [default0]:Skipping sample id=2737637. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2725375. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2739830. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2719249. Maximum sequence length: 2049, sample length: 3335 [default0]:Skipping sample id=2712977. Maximum sequence length: 2049, sample length: 3279 [default0]:Skipping sample id=2742536. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2486988. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2731033. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2732613. Maximum sequence length: 2049, sample length: 4057 [default0]:Skipping sample id=2756112. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2713046. Maximum sequence length: 2049, sample length: 4088 [default0]:Skipping sample id=2742488. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2730752. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2742269. Maximum sequence length: 2049, sample length: 3906 [default0]:Skipping sample id=2737569. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2721116. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2737991. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2711205. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2716344. Maximum sequence length: 2049, sample length: 7329 [default0]:Skipping sample id=2750515. Maximum sequence length: 2049, sample length: 3303 [default0]:Skipping sample id=2732700. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2732091. Maximum sequence length: 2049, sample length: 2966 [default0]:Skipping sample id=2721488. Maximum sequence length: 2049, sample length: 3223 [default0]:Skipping sample id=2716220. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2740279. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2754810. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2734198. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2730729. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2722829. Maximum sequence length: 2049, sample length: 2515 [default0]:Skipping sample id=2736536. Maximum sequence length: 2049, sample length: 4372 [default0]:Skipping sample id=2730938. Maximum sequence length: 2049, sample length: 2777 [default0]:Skipping sample id=2727395. Maximum sequence length: 2049, sample length: 4561 [default0]:Skipping sample id=2730396. Maximum sequence length: 2049, sample length: 3186 [default0]:Skipping sample id=2723706. Maximum sequence length: 2049, sample length: 3376 [default0]:Skipping sample id=2754654. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2723158. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2484400. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2729416. Maximum sequence length: 2049, sample length: 3294 [default0]:Skipping sample id=2727045. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2756017. Maximum sequence length: 2049, sample length: 3278 [default0]:Skipping sample id=2751375. Maximum sequence length: 2049, sample length: 4337 [default0]:Skipping sample id=2746827. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2730166. Maximum sequence length: 2049, sample length: 4793 [default0]:Skipping sample id=2751296. Maximum sequence length: 2049, sample length: 2749 [default0]:Skipping sample id=2496590. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2743121. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2478314. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2711661. Maximum sequence length: 2049, sample length: 3278 [default0]:Skipping sample id=2718896. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2755741. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2718930. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2725992. Maximum sequence length: 2049, sample length: 3968 [default0]:Skipping sample id=2483582. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2716418. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2719908. Maximum sequence length: 2049, sample length: 4803 [default0]:Skipping sample id=2482186. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2747129. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2722852. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2748964. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2736260. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2715079. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2744554. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2469947. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2729886. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2755132. Maximum sequence length: 2049, sample length: 3858 [default0]:Skipping sample id=2737939. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2745745. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2493901. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2723733. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2466240. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2740975. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2746703. Maximum sequence length: 2049, sample length: 4864 [default0]:Skipping sample id=2750920. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2750706. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2729561. Maximum sequence length: 2049, sample length: 3505 [default0]:Skipping sample id=2736461. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2745839. Maximum sequence length: 2049, sample length: 3715 [default0]:Skipping sample id=2731900. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2496091. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2742312. Maximum sequence length: 2049, sample length: 3422 [default0]:Skipping sample id=2466535. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2729959. Maximum sequence length: 2049, sample length: 2597 [default0]:Skipping sample id=2742863. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2734251. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2751826. Maximum sequence length: 2049, sample length: 2958 [default0]:Skipping sample id=2493927. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2734883. Maximum sequence length: 2049, sample length: 2704 [default0]:Skipping sample id=2722282. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2741304. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2734797. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2723086. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2480895. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2755975. Maximum sequence length: 2049, sample length: 3356 [default0]:Skipping sample id=2728621. Maximum sequence length: 2049, sample length: 4855 [default0]:Skipping sample id=2748278. Maximum sequence length: 2049, sample length: 3800 [default0]:Skipping sample id=2716753. Maximum sequence length: 2049, sample length: 3680 [default0]:Skipping sample id=2746043. Maximum sequence length: 2049, sample length: 3737 [default0]:Skipping sample id=2725619. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2487241. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2731254. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2719917. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2714935. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2753489. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2742738. Maximum sequence length: 2049, sample length: 3947 [default0]:Skipping sample id=2713892. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2745416. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2743531. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2733074. Maximum sequence length: 2049, sample length: 4324 [default0]:Skipping sample id=2490346. Maximum sequence length: 2049, sample length: 3462 [default0]:Skipping sample id=2748329. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2715460. Maximum sequence length: 2049, sample length: 4254 [default0]:Skipping sample id=2728357. Maximum sequence length: 2049, sample length: 4928 [default0]:Skipping sample id=2718868. Maximum sequence length: 2049, sample length: 4127 [default0]:Skipping sample id=2735054. Maximum sequence length: 2049, sample length: 4765 [default0]:Skipping sample id=2498060. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2726216. Maximum sequence length: 2049, sample length: 3090 [default0]:Skipping sample id=2747626. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2737824. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2735272. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2477695. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2718687. Maximum sequence length: 2049, sample length: 2954 [default0]:Skipping sample id=2741224. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2725215. Maximum sequence length: 2049, sample length: 3828 [default0]:Skipping sample id=2717045. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2712371. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2736550. Maximum sequence length: 2049, sample length: 2705 [default0]:Skipping sample id=2724219. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2468476. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2736681. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2754487. Maximum sequence length: 2049, sample length: 5534 [default0]:Skipping sample id=2469281. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2736875. Maximum sequence length: 2049, sample length: 3004 [default0]:Skipping sample id=2736771. Maximum sequence length: 2049, sample length: 5835 [default0]:Skipping sample id=2754437. Maximum sequence length: 2049, sample length: 3441 [default0]:Skipping sample id=2738329. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2726049. Maximum sequence length: 2049, sample length: 3232 [default0]:Skipping sample id=2743657. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2752581. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2751382. Maximum sequence length: 2049, sample length: 3419 [default0]:Skipping sample id=2489597. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2737149. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2730347. Maximum sequence length: 2049, sample length: 3624 [default0]:Skipping sample id=2724060. Maximum sequence length: 2049, sample length: 2434 [default0]:Skipping sample id=2723785. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2745098. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2755155. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2717037. Maximum sequence length: 2049, sample length: 4463 [default0]:Skipping sample id=2750343. Maximum sequence length: 2049, sample length: 6673 [default0]:Skipping sample id=2713917. Maximum sequence length: 2049, sample length: 3999 [default0]:Skipping sample id=2711946. Maximum sequence length: 2049, sample length: 4844 [default0]:Skipping sample id=2724207. Maximum sequence length: 2049, sample length: 6024 [default0]:Skipping sample id=2488471. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2492863. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2750586. Maximum sequence length: 2049, sample length: 4023 [default0]:Skipping sample id=2713808. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2745529. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2482305. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2728917. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2714519. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2726264. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2747650. Maximum sequence length: 2049, sample length: 2671 [default0]:Skipping sample id=2737193. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2723757. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2729172. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2722281. Maximum sequence length: 2049, sample length: 3284 [default0]:Skipping sample id=2489425. Maximum sequence length: 2049, sample length: 3383 [default0]:Skipping sample id=2477507. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2755715. Maximum sequence length: 2049, sample length: 4020 [default0]:Skipping sample id=2748069. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2479102. Maximum sequence length: 2049, sample length: 3466 [default0]:Skipping sample id=2712928. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2756729. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2490190. Maximum sequence length: 2049, sample length: 2678 [default0]:Skipping sample id=2482849. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2753037. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2741252. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2721910. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2495222. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2741394. Maximum sequence length: 2049, sample length: 3250 [default0]:Skipping sample id=2734771. Maximum sequence length: 2049, sample length: 4291 [default0]:Skipping sample id=2713988. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2748456. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2716707. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2721663. Maximum sequence length: 2049, sample length: 6080 [default0]:Skipping sample id=2755877. Maximum sequence length: 2049, sample length: 3577 [default0]:Skipping sample id=2719323. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2483211. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2729586. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2731892. Maximum sequence length: 2049, sample length: 2803 [default0]:Skipping sample id=2754736. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2711751. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2477345. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2736555. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2744124. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2713558. Maximum sequence length: 2049, sample length: 2954 [default0]:Skipping sample id=2713881. Maximum sequence length: 2049, sample length: 2557 [default0]:Skipping sample id=2466326. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2754694. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2747495. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2491143. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2722319. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2744867. Maximum sequence length: 2049, sample length: 3190 [default0]:Skipping sample id=2734419. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2714592. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2736872. Maximum sequence length: 2049, sample length: 3737 [default0]:Skipping sample id=2746806. Maximum sequence length: 2049, sample length: 3907 [default0]:Skipping sample id=2715934. Maximum sequence length: 2049, sample length: 5360 [default0]:Skipping sample id=2736860. Maximum sequence length: 2049, sample length: 7147 [default0]:Skipping sample id=2745631. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2714380. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2712388. Maximum sequence length: 2049, sample length: 4132 [default0]:Skipping sample id=2717586. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2752326. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2750003. Maximum sequence length: 2049, sample length: 3861 [default0]:Skipping sample id=2727437. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2724539. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2713656. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2714374. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2490333. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2735832. Maximum sequence length: 2049, sample length: 4172 [default0]:Skipping sample id=2742826. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2714755. Maximum sequence length: 2049, sample length: 2914 [default0]:Skipping sample id=2720746. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2737489. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2733577. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2725650. Maximum sequence length: 2049, sample length: 3474 [default0]:Skipping sample id=2754240. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2727830. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2743381. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2729569. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2739904. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2714607. Maximum sequence length: 2049, sample length: 4985 [default0]:Skipping sample id=2751504. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2713392. Maximum sequence length: 2049, sample length: 3465 [default0]:Skipping sample id=2724611. Maximum sequence length: 2049, sample length: 2717 [default0]:Skipping sample id=2482315. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2721764. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2714494. Maximum sequence length: 2049, sample length: 4153 [default0]:Skipping sample id=2736548. Maximum sequence length: 2049, sample length: 4124 [default0]:Skipping sample id=2728272. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2730680. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2727737. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2742250. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2744273. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2736744. Maximum sequence length: 2049, sample length: 2898 [default0]:Skipping sample id=2712427. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2746576. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2753581. Maximum sequence length: 2049, sample length: 4604 [default0]:Skipping sample id=2749148. Maximum sequence length: 2049, sample length: 3477 [default0]:Skipping sample id=2729512. Maximum sequence length: 2049, sample length: 4590 [default0]:Skipping sample id=2720659. Maximum sequence length: 2049, sample length: 3317 [default0]:Skipping sample id=2717053. Maximum sequence length: 2049, sample length: 3275 [default0]:Skipping sample id=2756434. Maximum sequence length: 2049, sample length: 3583 [default0]:Skipping sample id=2720005. Maximum sequence length: 2049, sample length: 4322 [default0]:Skipping sample id=2736547. Maximum sequence length: 2049, sample length: 4258 [default0]:Skipping sample id=2741069. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2745460. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2756717. Maximum sequence length: 2049, sample length: 6160 [default0]:Skipping sample id=2739394. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2745667. Maximum sequence length: 2049, sample length: 6445 [default0]:Skipping sample id=2714624. Maximum sequence length: 2049, sample length: 4324 [default0]:Skipping sample id=2714871. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2721317. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2730807. Maximum sequence length: 2049, sample length: 4591 [default0]:Skipping sample id=2748060. Maximum sequence length: 2049, sample length: 3713 [default0]:Skipping sample id=2730352. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2720027. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2711115. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2480842. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2494995. Maximum sequence length: 2049, sample length: 3610 [default0]:Skipping sample id=2484159. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2748441. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2738768. Maximum sequence length: 2049, sample length: 3297 [default0]:Skipping sample id=2756252. Maximum sequence length: 2049, sample length: 5867 [default0]:Skipping sample id=2725030. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2740542. Maximum sequence length: 2049, sample length: 3914 [default0]:Skipping sample id=2726212. Maximum sequence length: 2049, sample length: 2924 [default0]:Skipping sample id=2733029. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2485624. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2745842. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2748708. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2481397. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2489062. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2747701. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2725037. Maximum sequence length: 2049, sample length: 4810 [default0]:Skipping sample id=2755941. Maximum sequence length: 2049, sample length: 3241 [default0]:Skipping sample id=2488751. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2735477. Maximum sequence length: 2049, sample length: 4822 [default0]:Skipping sample id=2732439. Maximum sequence length: 2049, sample length: 4089 [default0]:Skipping sample id=2743457. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2721264. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2722831. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2746120. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2469701. Maximum sequence length: 2049, sample length: 2232 [default0]:Skipping sample id=2716973. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2747174. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2736372. Maximum sequence length: 2049, sample length: 3424 [default0]:Skipping sample id=2742526. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2739921. Maximum sequence length: 2049, sample length: 3073 [default0]:Skipping sample id=2491341. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2729237. Maximum sequence length: 2049, sample length: 3172 [default0]:Skipping sample id=2754559. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2484114. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2492841. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2732276. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2496470. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2729979. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2738252. Maximum sequence length: 2049, sample length: 2593 [default0]:Skipping sample id=2736187. Maximum sequence length: 2049, sample length: 3311 [default0]:Skipping sample id=2746135. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2742238. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2727152. Maximum sequence length: 2049, sample length: 2758 [default0]:Skipping sample id=2741682. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2723399. Maximum sequence length: 2049, sample length: 3086 [default0]:Skipping sample id=2741880. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2714623. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2711893. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2739170. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2749916. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2722964. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2490292. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2742244. Maximum sequence length: 2049, sample length: 4067 [default0]:Skipping sample id=2738146. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2748477. Maximum sequence length: 2049, sample length: 2644 [default0]:Skipping sample id=2745552. Maximum sequence length: 2049, sample length: 3195 [default0]:Skipping sample id=2741409. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2718014. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2731506. Maximum sequence length: 2049, sample length: 4335 [default0]:Skipping sample id=2718654. Maximum sequence length: 2049, sample length: 2553 [default0]:Skipping sample id=2756328. Maximum sequence length: 2049, sample length: 2923 [default0]:Skipping sample id=2726139. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2719790. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2720374. Maximum sequence length: 2049, sample length: 3809 [default0]:Skipping sample id=2733771. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2738381. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2728934. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2477477. Maximum sequence length: 2049, sample length: 2818 [default0]:Skipping sample id=2745943. Maximum sequence length: 2049, sample length: 3493 [default0]:Skipping sample id=2756044. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2733427. Maximum sequence length: 2049, sample length: 5811 [default0]:Skipping sample id=2754834. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2752438. Maximum sequence length: 2049, sample length: 3969 [default0]:Skipping sample id=2731401. Maximum sequence length: 2049, sample length: 3936 [default0]:Skipping sample id=2734607. Maximum sequence length: 2049, sample length: 5303 [default0]:Skipping sample id=2743059. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2723879. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2729475. Maximum sequence length: 2049, sample length: 2780 [default0]:Skipping sample id=2750612. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2736128. Maximum sequence length: 2049, sample length: 2986 [default0]:Skipping sample id=2714581. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2746753. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2750427. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2728411. Maximum sequence length: 2049, sample length: 2802 [default0]:Skipping sample id=2727482. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2728977. Maximum sequence length: 2049, sample length: 3970 [default0]:Skipping sample id=2738715. Maximum sequence length: 2049, sample length: 3120 [default0]:Skipping sample id=2727813. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2748593. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2712516. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2488708. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2742659. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2750691. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2712706. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2714706. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2748768. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2737970. Maximum sequence length: 2049, sample length: 3667 [default0]:Skipping sample id=2753144. Maximum sequence length: 2049, sample length: 4906 [default0]:Skipping sample id=2753680. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2745658. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2728549. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2715945. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2752655. Maximum sequence length: 2049, sample length: 3322 [default0]:Skipping sample id=2720680. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2487246. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2716967. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2744975. Maximum sequence length: 2049, sample length: 4598 [default0]:Skipping sample id=2731437. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2721359. Maximum sequence length: 2049, sample length: 4869 [default0]:Skipping sample id=2717137. Maximum sequence length: 2049, sample length: 4571 [default0]:Skipping sample id=2746697. Maximum sequence length: 2049, sample length: 2950 [default0]:Skipping sample id=2721280. Maximum sequence length: 2049, sample length: 2838 [default0]:Skipping sample id=2729807. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2480124. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2725073. Maximum sequence length: 2049, sample length: 3043 [default0]:Skipping sample id=2746897. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2737140. Maximum sequence length: 2049, sample length: 3904 [default0]:Skipping sample id=2731286. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2732152. Maximum sequence length: 2049, sample length: 3556 [default0]:Skipping sample id=2725091. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2750717. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2468795. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2757040. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2723462. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2743616. Maximum sequence length: 2049, sample length: 4696 [default0]:Skipping sample id=2724244. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2751865. Maximum sequence length: 2049, sample length: 3164 [default0]:Skipping sample id=2721117. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2755435. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2745088. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2725150. Maximum sequence length: 2049, sample length: 3596 [default0]:Skipping sample id=2755275. Maximum sequence length: 2049, sample length: 4956 [default0]:Skipping sample id=2732876. Maximum sequence length: 2049, sample length: 4967 [default0]:Skipping sample id=2737220. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2743269. Maximum sequence length: 2049, sample length: 3329 [default0]:Skipping sample id=2716188. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2468173. Maximum sequence length: 2049, sample length: 2741 [default0]:Skipping sample id=2730432. Maximum sequence length: 2049, sample length: 4579 [default0]:Skipping sample id=2736012. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2715667. Maximum sequence length: 2049, sample length: 4228 [default0]:Skipping sample id=2742636. Maximum sequence length: 2049, sample length: 3518 [default0]:Skipping sample id=2752993. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2724357. Maximum sequence length: 2049, sample length: 3037 [default0]:Skipping sample id=2724922. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2716127. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2750024. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2733508. Maximum sequence length: 2049, sample length: 5370 [default0]:Skipping sample id=2735277. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2733769. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2747376. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2736406. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2743279. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2727940. Maximum sequence length: 2049, sample length: 2486 [default0]:Skipping sample id=2753932. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2466192. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2721923. Maximum sequence length: 2049, sample length: 2961 [default0]:Skipping sample id=2719633. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2727864. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2746861. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2715962. Maximum sequence length: 2049, sample length: 3249 [default0]:Skipping sample id=2746978. Maximum sequence length: 2049, sample length: 4552 [default0]:Skipping sample id=2714704. Maximum sequence length: 2049, sample length: 3956 [default0]:Skipping sample id=2713581. Maximum sequence length: 2049, sample length: 4242 [default0]:Skipping sample id=2471072. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2721614. Maximum sequence length: 2049, sample length: 2868 [default0]:Skipping sample id=2755399. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2720165. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2737487. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2741645. Maximum sequence length: 2049, sample length: 3840 [default0]:Skipping sample id=2742949. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2743768. Maximum sequence length: 2049, sample length: 4587 [default0]:Skipping sample id=2756008. Maximum sequence length: 2049, sample length: 4472 [default0]:Skipping sample id=2722345. Maximum sequence length: 2049, sample length: 4339 [default0]:Skipping sample id=2747889. Maximum sequence length: 2049, sample length: 4943 [default0]:Skipping sample id=2754479. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2715122. Maximum sequence length: 2049, sample length: 3526 [default0]:Skipping sample id=2736894. Maximum sequence length: 2049, sample length: 4997 [default0]:Skipping sample id=2740823. Maximum sequence length: 2049, sample length: 5750 [default0]:Skipping sample id=2731432. Maximum sequence length: 2049, sample length: 4817 [default0]:Skipping sample id=2477034. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2753893. Maximum sequence length: 2049, sample length: 3023 [default0]:Skipping sample id=2719787. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2743124. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2745852. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2753246. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2718534. Maximum sequence length: 2049, sample length: 3571 [default0]:Skipping sample id=2738339. Maximum sequence length: 2049, sample length: 4072 [default0]:Skipping sample id=2739050. Maximum sequence length: 2049, sample length: 3436 [default0]:Skipping sample id=2716462. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2756592. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2720452. Maximum sequence length: 2049, sample length: 5675 [default0]:Skipping sample id=2748517. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2753722. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2749788. Maximum sequence length: 2049, sample length: 4019 [default0]:Skipping sample id=2726381. Maximum sequence length: 2049, sample length: 3301 [default0]:Skipping sample id=2753605. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2718877. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2715974. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2713629. Maximum sequence length: 2049, sample length: 4502 [default0]:Skipping sample id=2754056. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2716737. Maximum sequence length: 2049, sample length: 3459 [default0]:Skipping sample id=2746050. Maximum sequence length: 2049, sample length: 2955 [default0]:Skipping sample id=2718026. Maximum sequence length: 2049, sample length: 4249 [default0]:Skipping sample id=2724126. Maximum sequence length: 2049, sample length: 3172 [default0]:Skipping sample id=2734840. Maximum sequence length: 2049, sample length: 4175 [default0]:Skipping sample id=2731349. Maximum sequence length: 2049, sample length: 3239 [default0]:Skipping sample id=2752449. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2752668. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2484732. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2715534. Maximum sequence length: 2049, sample length: 3927 [default0]:Skipping sample id=2715535. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2755061. Maximum sequence length: 2049, sample length: 3096 [default0]:Skipping sample id=2712264. Maximum sequence length: 2049, sample length: 3949 [default0]:Skipping sample id=2746973. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2726361. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2730957. Maximum sequence length: 2049, sample length: 4014 [default0]:Skipping sample id=2719815. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2756201. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2745016. Maximum sequence length: 2049, sample length: 2908 [default0]:Skipping sample id=2721539. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2737472. Maximum sequence length: 2049, sample length: 3395 [default0]:Skipping sample id=2467542. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2730782. Maximum sequence length: 2049, sample length: 5532 [default0]:Skipping sample id=2746401. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2751588. Maximum sequence length: 2049, sample length: 6162 [default0]:Skipping sample id=2739908. Maximum sequence length: 2049, sample length: 6555 [default0]:Skipping sample id=2748346. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2742139. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2740060. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2730383. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2734888. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2753798. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2736491. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2714970. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2725090. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2484684. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2747595. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2732514. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2485218. Maximum sequence length: 2049, sample length: 2656 [default0]:Skipping sample id=2720965. Maximum sequence length: 2049, sample length: 3656 [default0]:Skipping sample id=2755526. Maximum sequence length: 2049, sample length: 4044 [default0]:Skipping sample id=2755470. Maximum sequence length: 2049, sample length: 3623 [default0]:Skipping sample id=2711330. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2737769. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2724625. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2731886. Maximum sequence length: 2049, sample length: 3193 [default0]:Skipping sample id=2754954. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2748722. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2719041. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2748504. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2732894. Maximum sequence length: 2049, sample length: 4475 [default0]:Skipping sample id=2471230. Maximum sequence length: 2049, sample length: 3104 [default0]:Skipping sample id=2732023. Maximum sequence length: 2049, sample length: 4681 [default0]:Skipping sample id=2746710. Maximum sequence length: 2049, sample length: 4314 [default0]:Skipping sample id=2726133. Maximum sequence length: 2049, sample length: 2717 [default0]:Skipping sample id=2732606. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2715685. Maximum sequence length: 2049, sample length: 6345 [default0]:Skipping sample id=2754135. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2751792. Maximum sequence length: 2049, sample length: 3755 [default0]:Skipping sample id=2754578. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2751174. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2735996. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2480341. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2737929. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2748085. Maximum sequence length: 2049, sample length: 3909 [default0]:Skipping sample id=2755771. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2498359. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2717851. Maximum sequence length: 2049, sample length: 3323 [default0]:Skipping sample id=2746722. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2728583. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2736162. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2711549. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2711223. Maximum sequence length: 2049, sample length: 2756 [default0]:Skipping sample id=2483477. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2752024. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2720146. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2718047. Maximum sequence length: 2049, sample length: 4934 [default0]:Skipping sample id=2730010. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2736490. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2739160. Maximum sequence length: 2049, sample length: 5102 [default0]:Skipping sample id=2740139. Maximum sequence length: 2049, sample length: 4038 [default0]:Skipping sample id=2482292. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2733944. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2754558. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2752748. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2722288. Maximum sequence length: 2049, sample length: 14264 [default0]:Skipping sample id=2746157. Maximum sequence length: 2049, sample length: 3410 [default0]:Skipping sample id=2717161. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2733814. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2737976. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2721258. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2712538. Maximum sequence length: 2049, sample length: 2869 [default0]:Skipping sample id=2470290. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2725116. Maximum sequence length: 2049, sample length: 3105 [default0]:Skipping sample id=2736972. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2718060. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2753736. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2481002. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2731230. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2718076. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2739579. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2719384. Maximum sequence length: 2049, sample length: 4301 [default0]:Skipping sample id=2751494. Maximum sequence length: 2049, sample length: 4175 [default0]:Skipping sample id=2483317. Maximum sequence length: 2049, sample length: 2421 [default0]:Skipping sample id=2720876. Maximum sequence length: 2049, sample length: 3119 [default0]:Skipping sample id=2717243. Maximum sequence length: 2049, sample length: 5466 [default0]:Skipping sample id=2747727. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2718161. Maximum sequence length: 2049, sample length: 3531 [default0]:Skipping sample id=2721461. Maximum sequence length: 2049, sample length: 3516 [default0]:Skipping sample id=2729671. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2723459. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2712274. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2712614. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2713694. Maximum sequence length: 2049, sample length: 3032 [default0]:Skipping sample id=2714372. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2718311. Maximum sequence length: 2049, sample length: 4030 [default0]:Skipping sample id=2737243. Maximum sequence length: 2049, sample length: 3244 [default0]:Skipping sample id=2730827. Maximum sequence length: 2049, sample length: 3742 [default0]:Skipping sample id=2467601. Maximum sequence length: 2049, sample length: 2740 [default0]:Skipping sample id=2746902. Maximum sequence length: 2049, sample length: 4267 [default0]:Skipping sample id=2746794. Maximum sequence length: 2049, sample length: 4232 [default0]:Skipping sample id=2716628. Maximum sequence length: 2049, sample length: 3293 [default0]:Skipping sample id=2729132. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2737659. Maximum sequence length: 2049, sample length: 3328 [default0]:Skipping sample id=2729838. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2726368. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2715617. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2719040. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2714815. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2747266. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2465878. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2743171. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2716972. Maximum sequence length: 2049, sample length: 3775 [default0]:Skipping sample id=2732946. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2725865. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2730896. Maximum sequence length: 2049, sample length: 5153 [default0]:Skipping sample id=2732673. Maximum sequence length: 2049, sample length: 3865 [default0]:Skipping sample id=2735633. Maximum sequence length: 2049, sample length: 2899 [default0]:Skipping sample id=2750818. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2729299. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2738831. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2730358. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2715012. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2756961. Maximum sequence length: 2049, sample length: 4870 [default0]:Skipping sample id=2756049. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2716359. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2751632. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2730930. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2749709. Maximum sequence length: 2049, sample length: 3057 [default0]:Skipping sample id=2716202. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2735630. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2751593. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2491577. Maximum sequence length: 2049, sample length: 3314 [default0]:Skipping sample id=2712262. Maximum sequence length: 2049, sample length: 4261 [default0]:Skipping sample id=2756250. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2737983. Maximum sequence length: 2049, sample length: 5486 [default0]:Skipping sample id=2484836. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2745107. Maximum sequence length: 2049, sample length: 3234 [default0]:Skipping sample id=2718557. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2723471. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2496419. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2734314. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2481914. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2719033. Maximum sequence length: 2049, sample length: 3041 [default0]:Skipping sample id=2477385. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2737668. Maximum sequence length: 2049, sample length: 3074 [default0]:Skipping sample id=2729373. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2754350. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2725723. Maximum sequence length: 2049, sample length: 3388 [default0]:Skipping sample id=2736415. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2714350. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2740590. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2716088. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2726290. Maximum sequence length: 2049, sample length: 4141 [default0]:Skipping sample id=2740354. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2743937. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2739346. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2746253. Maximum sequence length: 2049, sample length: 3412 [default0]:Skipping sample id=2745352. Maximum sequence length: 2049, sample length: 3661 [default0]:Skipping sample id=2756516. Maximum sequence length: 2049, sample length: 5270 [default0]:Skipping sample id=2487823. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2496261. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2712172. Maximum sequence length: 2049, sample length: 3149 [default0]:Skipping sample id=2752104. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2718935. Maximum sequence length: 2049, sample length: 3170 [default0]:Skipping sample id=2746987. Maximum sequence length: 2049, sample length: 3971 [default0]:Skipping sample id=2750056. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2711464. Maximum sequence length: 2049, sample length: 2896 [default0]:Skipping sample id=2720400. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2724196. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2713798. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2485117. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2725423. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2711476. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2721836. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2736886. Maximum sequence length: 2049, sample length: 3061 [default0]:Skipping sample id=2713273. Maximum sequence length: 2049, sample length: 4095 [default0]:Skipping sample id=2483129. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2753190. Maximum sequence length: 2049, sample length: 3701 [default0]:Skipping sample id=2750339. Maximum sequence length: 2049, sample length: 4561 [default0]:Skipping sample id=2731681. Maximum sequence length: 2049, sample length: 5816 [default0]:Skipping sample id=2726218. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2726453. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2485454. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2743580. Maximum sequence length: 2049, sample length: 4018 [default0]:Skipping sample id=2488838. Maximum sequence length: 2049, sample length: 2892 [default0]:Skipping sample id=2712686. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2743126. Maximum sequence length: 2049, sample length: 2440 [default0]:Skipping sample id=2746819. Maximum sequence length: 2049, sample length: 4690 [default0]:Skipping sample id=2715340. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2715040. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2740165. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2716255. Maximum sequence length: 2049, sample length: 5938 [default0]:Skipping sample id=2740577. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2731361. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2753370. Maximum sequence length: 2049, sample length: 5492 [default0]:Skipping sample id=2724945. Maximum sequence length: 2049, sample length: 3257 [default0]:Skipping sample id=2717091. Maximum sequence length: 2049, sample length: 3418 [default0]:Skipping sample id=2744898. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2748992. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2720429. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2712987. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2740860. Maximum sequence length: 2049, sample length: 3299 [default0]:Skipping sample id=2715447. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2728548. Maximum sequence length: 2049, sample length: 3668 [default0]:Skipping sample id=2714653. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2725436. Maximum sequence length: 2049, sample length: 5997 [default0]:Skipping sample id=2733586. Maximum sequence length: 2049, sample length: 3354 [default0]:Skipping sample id=2714429. Maximum sequence length: 2049, sample length: 6465 [default0]:Skipping sample id=2711456. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2733453. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2752079. Maximum sequence length: 2049, sample length: 2750 [default0]:Skipping sample id=2718366. Maximum sequence length: 2049, sample length: 2573 [default0]:Skipping sample id=2749120. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2470827. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2746497. Maximum sequence length: 2049, sample length: 8032 [default0]:Skipping sample id=2465747. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2717052. Maximum sequence length: 2049, sample length: 4099 [default0]:Skipping sample id=2724929. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2711003. Maximum sequence length: 2049, sample length: 3803 [default0]:Skipping sample id=2751433. Maximum sequence length: 2049, sample length: 3239 [default0]:Skipping sample id=2731391. Maximum sequence length: 2049, sample length: 4204 [default0]:Skipping sample id=2748197. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2739238. Maximum sequence length: 2049, sample length: 2619 [default0]:Skipping sample id=2478334. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2483411. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2733652. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2726793. Maximum sequence length: 2049, sample length: 4042 [default0]:Skipping sample id=2735637. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2721929. Maximum sequence length: 2049, sample length: 3120 [default0]:Skipping sample id=2729445. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2477761. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2494201. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2715627. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2718459. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2736049. Maximum sequence length: 2049, sample length: 5135 [default0]:Skipping sample id=2714386. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2715395. Maximum sequence length: 2049, sample length: 5542 [default0]:Skipping sample id=2714212. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2484195. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2732340. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2754981. Maximum sequence length: 2049, sample length: 5817 [default0]:Skipping sample id=2466030. Maximum sequence length: 2049, sample length: 3075 [default0]:Skipping sample id=2739578. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2734208. Maximum sequence length: 2049, sample length: 5115 [default0]:Skipping sample id=2468353. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2730021. Maximum sequence length: 2049, sample length: 3091 [default0]:Skipping sample id=2731356. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2723588. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2717134. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2711681. Maximum sequence length: 2049, sample length: 2506 [default0]:Skipping sample id=2724799. Maximum sequence length: 2049, sample length: 2531 [default0]:Skipping sample id=2728286. Maximum sequence length: 2049, sample length: 4316 [default0]:Skipping sample id=2723180. Maximum sequence length: 2049, sample length: 4190 [default0]:Skipping sample id=2730776. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2720138. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2714337. Maximum sequence length: 2049, sample length: 5554 [default0]:Skipping sample id=2753621. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2736635. Maximum sequence length: 2049, sample length: 4229 [default0]:Skipping sample id=2741766. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2725551. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2483415. Maximum sequence length: 2049, sample length: 2888 [default0]:Skipping sample id=2724158. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2711548. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2746418. Maximum sequence length: 2049, sample length: 3226 [default0]:Skipping sample id=2721777. Maximum sequence length: 2049, sample length: 3551 [default0]:Skipping sample id=2730260. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2746898. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2718346. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2756375. Maximum sequence length: 2049, sample length: 3246 [default0]:Skipping sample id=2715999. Maximum sequence length: 2049, sample length: 3643 [default0]:Skipping sample id=2722257. Maximum sequence length: 2049, sample length: 3578 [default0]:Skipping sample id=2720837. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2727002. Maximum sequence length: 2049, sample length: 3997 [default0]:Skipping sample id=2753224. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2734035. Maximum sequence length: 2049, sample length: 3986 [default0]:Skipping sample id=2740124. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2724576. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2754383. Maximum sequence length: 2049, sample length: 3581 [default0]:Skipping sample id=2735298. Maximum sequence length: 2049, sample length: 3694 [default0]:Skipping sample id=2747971. Maximum sequence length: 2049, sample length: 3934 [default0]:Skipping sample id=2713911. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2485188. Maximum sequence length: 2049, sample length: 3017 [default0]:Skipping sample id=2753104. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2749309. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2735586. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2749217. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2726582. Maximum sequence length: 2049, sample length: 4628 [default0]:Skipping sample id=2726233. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2730677. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2749740. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2734533. Maximum sequence length: 2049, sample length: 3254 [default0]:Skipping sample id=2732288. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2723517. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2489817. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2714781. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2740262. Maximum sequence length: 2049, sample length: 6479 [default0]:Skipping sample id=2478806. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2751794. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2717253. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2741844. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2755936. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2728690. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2715895. Maximum sequence length: 2049, sample length: 3192 [default0]:Skipping sample id=2479538. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2719886. Maximum sequence length: 2049, sample length: 3829 [default0]:Skipping sample id=2711499. Maximum sequence length: 2049, sample length: 3293 [default0]:Skipping sample id=2749636. Maximum sequence length: 2049, sample length: 3337 [default0]:Skipping sample id=2739446. Maximum sequence length: 2049, sample length: 3315 [default0]:Skipping sample id=2719877. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2724872. Maximum sequence length: 2049, sample length: 2369 [default0]:Skipping sample id=2484427. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2752551. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2711436. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2747280. Maximum sequence length: 2049, sample length: 3401 [default0]:Skipping sample id=2498849. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2744137. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2470154. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2723499. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2752084. Maximum sequence length: 2049, sample length: 2999 [default0]:Skipping sample id=2716599. Maximum sequence length: 2049, sample length: 4238 [default0]:Skipping sample id=2734106. Maximum sequence length: 2049, sample length: 3969 [default0]:Skipping sample id=2738637. Maximum sequence length: 2049, sample length: 5145 [default0]:Skipping sample id=2751076. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2719109. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2730017. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2495400. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2495798. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2723119. Maximum sequence length: 2049, sample length: 3144 [default0]:Skipping sample id=2721494. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2736150. Maximum sequence length: 2049, sample length: 4223 [default0]:Skipping sample id=2715891. Maximum sequence length: 2049, sample length: 3688 [default0]:Skipping sample id=2713622. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2728716. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2730171. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2747521. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2744383. Maximum sequence length: 2049, sample length: 3266 [default0]:Skipping sample id=2492632. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2727010. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2726677. Maximum sequence length: 2049, sample length: 2862 [default0]:Skipping sample id=2727342. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2755282. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2712558. Maximum sequence length: 2049, sample length: 4334 [default0]:Skipping sample id=2714022. Maximum sequence length: 2049, sample length: 3606 [default0]:Skipping sample id=2731743. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2744600. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2756151. Maximum sequence length: 2049, sample length: 5435 [default0]:Skipping sample id=2720048. Maximum sequence length: 2049, sample length: 3988 [default0]:Skipping sample id=2497341. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2727410. Maximum sequence length: 2049, sample length: 3267 [default0]:Skipping sample id=2716303. Maximum sequence length: 2049, sample length: 3627 [default0]:Skipping sample id=2746194. Maximum sequence length: 2049, sample length: 4497 [default0]:Skipping sample id=2745026. Maximum sequence length: 2049, sample length: 4426 [default0]:Skipping sample id=2754934. Maximum sequence length: 2049, sample length: 2822 [default0]:Skipping sample id=2747566. Maximum sequence length: 2049, sample length: 4000 [default0]:Skipping sample id=2726370. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2718817. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2495335. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2748607. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2724476. Maximum sequence length: 2049, sample length: 2792 [default0]:Skipping sample id=2728039. Maximum sequence length: 2049, sample length: 4792 [default0]:Skipping sample id=2743529. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2714293. Maximum sequence length: 2049, sample length: 2869 [default0]:Skipping sample id=2750095. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2744147. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2725981. Maximum sequence length: 2049, sample length: 8168 [default0]:Skipping sample id=2740606. Maximum sequence length: 2049, sample length: 4592 [default0]:Skipping sample id=2753255. Maximum sequence length: 2049, sample length: 5268 [default0]:Skipping sample id=2754567. Maximum sequence length: 2049, sample length: 4120 [default0]:Skipping sample id=2726741. Maximum sequence length: 2049, sample length: 3788 [default0]:Skipping sample id=2736178. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2712469. Maximum sequence length: 2049, sample length: 4772 [default0]:Skipping sample id=2719781. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2728110. Maximum sequence length: 2049, sample length: 2765 [default0]:Skipping sample id=2735949. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2716675. Maximum sequence length: 2049, sample length: 3597 [default0]:Skipping sample id=2726671. Maximum sequence length: 2049, sample length: 5096 [default0]:Skipping sample id=2755273. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2726003. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2497810. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2734139. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2717691. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2727048. Maximum sequence length: 2049, sample length: 3268 [default0]:Skipping sample id=2487937. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2754839. Maximum sequence length: 2049, sample length: 4067 [default0]:Skipping sample id=2740621. Maximum sequence length: 2049, sample length: 4466 [default0]:Skipping sample id=2744229. Maximum sequence length: 2049, sample length: 5528 [default0]:Skipping sample id=2736616. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2753140. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2480505. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2728892. Maximum sequence length: 2049, sample length: 3260 [default0]:Skipping sample id=2468224. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2729067. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2746843. Maximum sequence length: 2049, sample length: 4807 [default0]:Skipping sample id=2716918. Maximum sequence length: 2049, sample length: 3635 [default0]:Skipping sample id=2756521. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2753627. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2711880. Maximum sequence length: 2049, sample length: 3698 [default0]:Skipping sample id=2744722. Maximum sequence length: 2049, sample length: 3630 [default0]:Skipping sample id=2719088. Maximum sequence length: 2049, sample length: 3221 [default0]:Skipping sample id=2738157. Maximum sequence length: 2049, sample length: 4269 [default0]:Skipping sample id=2731487. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2727141. Maximum sequence length: 2049, sample length: 2753 [default0]:Skipping sample id=2719984. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2488834. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2711417. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2743341. Maximum sequence length: 2049, sample length: 3326 [default0]:Skipping sample id=2711076. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2751574. Maximum sequence length: 2049, sample length: 3232 [default0]:Skipping sample id=2712355. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2733975. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2495608. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2756145. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2725905. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2713948. Maximum sequence length: 2049, sample length: 3868 [default0]:Skipping sample id=2719751. Maximum sequence length: 2049, sample length: 4533 [default0]:Skipping sample id=2747511. Maximum sequence length: 2049, sample length: 3308 [default0]:Skipping sample id=2493870. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2730137. Maximum sequence length: 2049, sample length: 3731 [default0]:Skipping sample id=2735009. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2737361. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2492353. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2742320. Maximum sequence length: 2049, sample length: 3364 [default0]:Skipping sample id=2750874. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2467482. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2730088. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2489088. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2498936. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2731037. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2755354. Maximum sequence length: 2049, sample length: 3017 [default0]:Skipping sample id=2496838. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2479214. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2755374. Maximum sequence length: 2049, sample length: 3405 [default0]:Skipping sample id=2718798. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2745388. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2496124. Maximum sequence length: 2049, sample length: 2837 [default0]:Skipping sample id=2468556. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2740018. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2749577. Maximum sequence length: 2049, sample length: 6637 [default0]:Skipping sample id=2721310. Maximum sequence length: 2049, sample length: 3096 [default0]:Skipping sample id=2748339. Maximum sequence length: 2049, sample length: 4328 [default0]:Skipping sample id=2754460. Maximum sequence length: 2049, sample length: 3456 [default0]:Skipping sample id=2756871. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2754972. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2468046. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2488707. Maximum sequence length: 2049, sample length: 3170 [default0]:Skipping sample id=2747873. Maximum sequence length: 2049, sample length: 3925 [default0]:Skipping sample id=2741340. Maximum sequence length: 2049, sample length: 3698 [default0]:Skipping sample id=2753681. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2720638. Maximum sequence length: 2049, sample length: 5163 [default0]:Skipping sample id=2715975. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2733538. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2749426. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2730357. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2721708. Maximum sequence length: 2049, sample length: 3120 [default0]:Skipping sample id=2721298. Maximum sequence length: 2049, sample length: 2705 [default0]:Skipping sample id=2720198. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2742869. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2743371. Maximum sequence length: 2049, sample length: 3154 [default0]:Skipping sample id=2720651. Maximum sequence length: 2049, sample length: 4145 [default0]:Skipping sample id=2723483. Maximum sequence length: 2049, sample length: 4965 [default0]:Skipping sample id=2739453. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2719032. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2719891. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2727089. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2730263. Maximum sequence length: 2049, sample length: 4312 [default0]:Skipping sample id=2723236. Maximum sequence length: 2049, sample length: 3922 [default0]:Skipping sample id=2718048. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2750651. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2743217. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2750082. Maximum sequence length: 2049, sample length: 3363 [default0]:Skipping sample id=2732028. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2750364. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2748626. Maximum sequence length: 2049, sample length: 2557 [default0]:Skipping sample id=2746627. Maximum sequence length: 2049, sample length: 3827 [default0]:Skipping sample id=2485160. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2744646. Maximum sequence length: 2049, sample length: 3356 [default0]:Skipping sample id=2712072. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2737203. Maximum sequence length: 2049, sample length: 4539 [default0]:Skipping sample id=2715929. Maximum sequence length: 2049, sample length: 4988 [default0]:Skipping sample id=2735203. Maximum sequence length: 2049, sample length: 3240 [default0]:Skipping sample id=2495401. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2731661. Maximum sequence length: 2049, sample length: 5192 [default0]:Skipping sample id=2488551. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2714261. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2494460. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2730070. Maximum sequence length: 2049, sample length: 4199 [default0]:Skipping sample id=2743872. Maximum sequence length: 2049, sample length: 2808 [default0]:Skipping sample id=2729393. Maximum sequence length: 2049, sample length: 4138 [default0]:Skipping sample id=2725661. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2714388. Maximum sequence length: 2049, sample length: 3151 [default0]:Skipping sample id=2752657. Maximum sequence length: 2049, sample length: 5847 [default0]:Skipping sample id=2711327. Maximum sequence length: 2049, sample length: 2502 [default0]:Skipping sample id=2741134. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2740170. Maximum sequence length: 2049, sample length: 3761 [default0]:Skipping sample id=2736152. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2742998. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2718447. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2482877. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2734556. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2720164. Maximum sequence length: 2049, sample length: 3239 [default0]:Skipping sample id=2727592. Maximum sequence length: 2049, sample length: 4739 [default0]:Skipping sample id=2741627. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2754946. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2712346. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2723952. Maximum sequence length: 2049, sample length: 3039 [default0]:Skipping sample id=2720101. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2715636. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2710974. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2722204. Maximum sequence length: 2049, sample length: 4851 [default0]:Skipping sample id=2727568. Maximum sequence length: 2049, sample length: 3204 [default0]:Skipping sample id=2752679. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2738033. Maximum sequence length: 2049, sample length: 3937 [default0]:Skipping sample id=2499158. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2745993. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2723930. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2753251. Maximum sequence length: 2049, sample length: 2907 [default0]:Skipping sample id=2756409. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2752670. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2738721. Maximum sequence length: 2049, sample length: 4808 [default0]:Skipping sample id=2720064. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2743247. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2468104. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2714439. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2723910. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2730651. Maximum sequence length: 2049, sample length: 3135 [default0]:Skipping sample id=2728144. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2752054. Maximum sequence length: 2049, sample length: 2646 [default0]:Skipping sample id=2730264. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2499287. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2717475. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2717777. Maximum sequence length: 2049, sample length: 2774 [default0]:Skipping sample id=2734964. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2733660. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2754920. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2724968. Maximum sequence length: 2049, sample length: 5529 [default0]:Skipping sample id=2711013. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2728027. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2743585. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2492828. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2718849. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2721508. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2741849. Maximum sequence length: 2049, sample length: 4546 [default0]:Skipping sample id=2744754. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2712458. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2747776. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2744214. Maximum sequence length: 2049, sample length: 2574 [default0]:Skipping sample id=2746565. Maximum sequence length: 2049, sample length: 3125 [default0]:Skipping sample id=2754858. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2724958. Maximum sequence length: 2049, sample length: 2616 [default0]:Skipping sample id=2747117. Maximum sequence length: 2049, sample length: 2903 [default0]:Skipping sample id=2716034. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2719885. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2713609. Maximum sequence length: 2049, sample length: 3355 [default0]:Skipping sample id=2743568. Maximum sequence length: 2049, sample length: 4277 [default0]:Skipping sample id=2734388. Maximum sequence length: 2049, sample length: 3615 [default0]:Skipping sample id=2489004. Maximum sequence length: 2049, sample length: 2870 [default0]:Skipping sample id=2718693. Maximum sequence length: 2049, sample length: 4201 [default0]:Skipping sample id=2742715. Maximum sequence length: 2049, sample length: 5133 [default0]:Skipping sample id=2471308. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2482638. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2721378. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2742176. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2723561. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2488727. Maximum sequence length: 2049, sample length: 2601 [default0]:Skipping sample id=2731449. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2749204. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2748692. Maximum sequence length: 2049, sample length: 3942 [default0]:Skipping sample id=2484063. Maximum sequence length: 2049, sample length: 2697 [default0]:Skipping sample id=2733839. Maximum sequence length: 2049, sample length: 6318 [default0]:Skipping sample id=2737940. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2468312. Maximum sequence length: 2049, sample length: 2649 [default0]:Skipping sample id=2734330. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2728497. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2722636. Maximum sequence length: 2049, sample length: 4492 [default0]:Skipping sample id=2714873. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2738451. Maximum sequence length: 2049, sample length: 4017 [default0]:Skipping sample id=2742764. Maximum sequence length: 2049, sample length: 3127 [default0]:Skipping sample id=2731418. Maximum sequence length: 2049, sample length: 4258 [default0]:Skipping sample id=2756779. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2728130. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2716129. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2734212. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2715797. Maximum sequence length: 2049, sample length: 3717 [default0]:Skipping sample id=2477215. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2731989. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2745714. Maximum sequence length: 2049, sample length: 4331 [default0]:Skipping sample id=2717884. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2716531. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2752734. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2753837. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2719793. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2739932. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2738609. Maximum sequence length: 2049, sample length: 3603 [default0]:Skipping sample id=2728379. Maximum sequence length: 2049, sample length: 5371 [default0]:Skipping sample id=2467300. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2740037. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2718492. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2717332. Maximum sequence length: 2049, sample length: 5943 [default0]:Skipping sample id=2719250. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2730509. Maximum sequence length: 2049, sample length: 3516 [default0]:Skipping sample id=2715751. Maximum sequence length: 2049, sample length: 4017 [default0]:Skipping sample id=2752015. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2749006. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2750981. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2716845. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2735461. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2755350. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2744134. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2734995. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2750083. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2729480. Maximum sequence length: 2049, sample length: 3380 [default0]:Skipping sample id=2499263. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2749073. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2717677. Maximum sequence length: 2049, sample length: 2434 [default0]:Skipping sample id=2721816. Maximum sequence length: 2049, sample length: 4725 [default0]:Skipping sample id=2750510. Maximum sequence length: 2049, sample length: 4740 [default0]:Skipping sample id=2478399. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2746445. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2720548. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2736900. Maximum sequence length: 2049, sample length: 4859 [default0]:Skipping sample id=2712680. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2748004. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2756923. Maximum sequence length: 2049, sample length: 3630 [default0]:Skipping sample id=2731992. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2730457. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2721305. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2734042. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2718318. Maximum sequence length: 2049, sample length: 6491 [default0]:Skipping sample id=2726446. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2722047. Maximum sequence length: 2049, sample length: 3642 [default0]:Skipping sample id=2714382. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2740230. Maximum sequence length: 2049, sample length: 2967 [default0]:Skipping sample id=2742986. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2729657. Maximum sequence length: 2049, sample length: 2553 [default0]:Skipping sample id=2720718. Maximum sequence length: 2049, sample length: 4736 [default0]:Skipping sample id=2490049. Maximum sequence length: 2049, sample length: 3230 [default0]:Skipping sample id=2720662. Maximum sequence length: 2049, sample length: 5146 [default0]:Skipping sample id=2739373. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2711340. Maximum sequence length: 2049, sample length: 6661 [default0]:Skipping sample id=2748361. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2756386. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2749628. Maximum sequence length: 2049, sample length: 3646 [default0]:Skipping sample id=2745236. Maximum sequence length: 2049, sample length: 5050 [default0]:Skipping sample id=2730464. Maximum sequence length: 2049, sample length: 7775 [default0]:Skipping sample id=2742994. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2741676. Maximum sequence length: 2049, sample length: 4133 [default0]:Skipping sample id=2497900. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2718388. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2744209. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2725431. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2724600. Maximum sequence length: 2049, sample length: 3624 [default0]:Skipping sample id=2754754. Maximum sequence length: 2049, sample length: 4445 [default0]:Skipping sample id=2718787. Maximum sequence length: 2049, sample length: 3497 [default0]:Skipping sample id=2723987. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2733210. Maximum sequence length: 2049, sample length: 4242 [default0]:Skipping sample id=2483240. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2749130. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2727856. Maximum sequence length: 2049, sample length: 4762 [default0]:Skipping sample id=2755104. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2735839. Maximum sequence length: 2049, sample length: 5665 [default0]:Skipping sample id=2724477. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2732447. Maximum sequence length: 2049, sample length: 3747 [default0]:Skipping sample id=2742056. Maximum sequence length: 2049, sample length: 2567 [default0]:Skipping sample id=2716352. Maximum sequence length: 2049, sample length: 2756 [default0]:Skipping sample id=2732590. Maximum sequence length: 2049, sample length: 3740 [default0]:Skipping sample id=2724002. Maximum sequence length: 2049, sample length: 3332 [default0]:Skipping sample id=2726806. Maximum sequence length: 2049, sample length: 6674 [default0]:Skipping sample id=2714654. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2731457. Maximum sequence length: 2049, sample length: 3256 [default0]:Skipping sample id=2734774. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2742104. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2493488. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2742921. Maximum sequence length: 2049, sample length: 3428 [default0]:Skipping sample id=2715060. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2721915. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2711942. Maximum sequence length: 2049, sample length: 3186 [default0]:Skipping sample id=2728303. Maximum sequence length: 2049, sample length: 3146 [default0]:Skipping sample id=2722988. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2743169. Maximum sequence length: 2049, sample length: 3040 [default0]:Skipping sample id=2727196. Maximum sequence length: 2049, sample length: 3615 [default0]:Skipping sample id=2483378. Maximum sequence length: 2049, sample length: 3164 [default0]:Skipping sample id=2470276. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2731561. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2713583. Maximum sequence length: 2049, sample length: 3110 [default0]:Skipping sample id=2720074. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2491216. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2748633. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2751193. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2717467. Maximum sequence length: 2049, sample length: 5747 [default0]:Skipping sample id=2717042. Maximum sequence length: 2049, sample length: 3025 [default0]:Skipping sample id=2756960. Maximum sequence length: 2049, sample length: 4391 [default0]:Skipping sample id=2753613. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2722937. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2744817. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2711617. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2733171. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2724238. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2737485. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2715843. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2717373. Maximum sequence length: 2049, sample length: 2998 [default0]:Skipping sample id=2717220. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2749308. Maximum sequence length: 2049, sample length: 3443 [default0]:Skipping sample id=2724170. Maximum sequence length: 2049, sample length: 2234 [default0]:Skipping sample id=2715339. Maximum sequence length: 2049, sample length: 3047 [default0]:Skipping sample id=2753198. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2731428. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2755382. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2742180. Maximum sequence length: 2049, sample length: 3964 [default0]:Skipping sample id=2727742. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2713186. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2730009. Maximum sequence length: 2049, sample length: 4523 [default0]:Skipping sample id=2718652. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2496540. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2716062. Maximum sequence length: 2049, sample length: 3191 [default0]:Skipping sample id=2723591. Maximum sequence length: 2049, sample length: 3988 [default0]:Skipping sample id=2723001. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2751213. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2754046. Maximum sequence length: 2049, sample length: 3306 [default0]:Skipping sample id=2732804. Maximum sequence length: 2049, sample length: 3024 [default0]:Skipping sample id=2726478. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2733493. Maximum sequence length: 2049, sample length: 3269 [default0]:Skipping sample id=2753044. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2755649. Maximum sequence length: 2049, sample length: 6485 [default0]:Skipping sample id=2720873. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2740169. Maximum sequence length: 2049, sample length: 3492 [default0]:Skipping sample id=2713860. Maximum sequence length: 2049, sample length: 5373 [default0]:Skipping sample id=2743689. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2732574. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2718190. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2744139. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2726240. Maximum sequence length: 2049, sample length: 2879 [default0]:Skipping sample id=2480033. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2494363. Maximum sequence length: 2049, sample length: 3246 [default0]:Skipping sample id=2712008. Maximum sequence length: 2049, sample length: 4571 [default0]:Skipping sample id=2716859. Maximum sequence length: 2049, sample length: 5312 [default0]:Skipping sample id=2742263. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2728737. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2725371. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2726393. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2750556. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2731776. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2737650. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2750991. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2732370. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2719590. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2468697. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2734238. Maximum sequence length: 2049, sample length: 4690 [default0]:Skipping sample id=2489022. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2489890. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2723624. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2469398. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2498895. Maximum sequence length: 2049, sample length: 2639 [default0]:Skipping sample id=2731471. Maximum sequence length: 2049, sample length: 2939 [default0]:Skipping sample id=2730806. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2726184. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2494644. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2731374. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2743911. Maximum sequence length: 2049, sample length: 2574 [default0]:Skipping sample id=2729650. Maximum sequence length: 2049, sample length: 6629 [default0]:Skipping sample id=2482247. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2748075. Maximum sequence length: 2049, sample length: 2965 [default0]:Skipping sample id=2466064. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2737869. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2731869. Maximum sequence length: 2049, sample length: 3858 [default0]:Skipping sample id=2487344. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2738577. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2728953. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2717116. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2734508. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2724330. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2734456. Maximum sequence length: 2049, sample length: 2573 [default0]:Skipping sample id=2713443. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2727396. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2725521. Maximum sequence length: 2049, sample length: 3646 [default0]:Skipping sample id=2483865. Maximum sequence length: 2049, sample length: 2656 [default0]:Skipping sample id=2731952. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2731933. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2717971. Maximum sequence length: 2049, sample length: 4910 [default0]:Skipping sample id=2466757. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2714071. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2728214. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2723141. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2729936. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2725195. Maximum sequence length: 2049, sample length: 3731 [default0]:Skipping sample id=2756154. Maximum sequence length: 2049, sample length: 3445 [default0]:Skipping sample id=2711101. Maximum sequence length: 2049, sample length: 4862 [default0]:Skipping sample id=2496616. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2751892. Maximum sequence length: 2049, sample length: 2778 [default0]:Skipping sample id=2733597. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2742566. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2724383. Maximum sequence length: 2049, sample length: 4607 [default0]:Skipping sample id=2736382. Maximum sequence length: 2049, sample length: 3202 [default0]:Skipping sample id=2747550. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2734062. Maximum sequence length: 2049, sample length: 3027 [default0]:Skipping sample id=2730606. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2721756. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2724470. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2732032. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2722901. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2753061. Maximum sequence length: 2049, sample length: 3332 [default0]:Skipping sample id=2752940. Maximum sequence length: 2049, sample length: 4373 [default0]:Skipping sample id=2720613. Maximum sequence length: 2049, sample length: 5393 [default0]:Skipping sample id=2484553. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2748022. Maximum sequence length: 2049, sample length: 2669 [default0]:Skipping sample id=2750623. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2736854. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2719922. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2712549. Maximum sequence length: 2049, sample length: 4764 [default0]:Skipping sample id=2713337. Maximum sequence length: 2049, sample length: 5033 [default0]:Skipping sample id=2717446. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2489693. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2731973. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2737838. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2722940. Maximum sequence length: 2049, sample length: 2602 [default0]:Skipping sample id=2744634. Maximum sequence length: 2049, sample length: 5104 [default0]:Skipping sample id=2726451. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2752362. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2712564. Maximum sequence length: 2049, sample length: 3174 [default0]:Skipping sample id=2719727. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2738982. Maximum sequence length: 2049, sample length: 4742 [default0]:Skipping sample id=2756504. Maximum sequence length: 2049, sample length: 4335 [default0]:Skipping sample id=2735919. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2729965. Maximum sequence length: 2049, sample length: 3768 [default0]:Skipping sample id=2752945. Maximum sequence length: 2049, sample length: 2774 [default0]:Skipping sample id=2723458. Maximum sequence length: 2049, sample length: 3072 [default0]:Skipping sample id=2721207. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2727949. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2719446. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2731310. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2496963. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2738710. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2753297. Maximum sequence length: 2049, sample length: 3368 [default0]:Skipping sample id=2740171. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2711325. Maximum sequence length: 2049, sample length: 4438 [default0]:Skipping sample id=2742014. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2756694. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2730815. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2749909. Maximum sequence length: 2049, sample length: 4010 [default0]:Skipping sample id=2739995. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2721198. Maximum sequence length: 2049, sample length: 5057 [default0]:Skipping sample id=2727404. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2734960. Maximum sequence length: 2049, sample length: 3011 [default0]:Skipping sample id=2713281. Maximum sequence length: 2049, sample length: 3913 [default0]:Skipping sample id=2483887. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2750944. Maximum sequence length: 2049, sample length: 5017 [default0]:Skipping sample id=2734830. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2755483. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2722222. Maximum sequence length: 2049, sample length: 4167 [default0]:Skipping sample id=2716794. Maximum sequence length: 2049, sample length: 2954 [default0]:Skipping sample id=2756072. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2721850. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2713530. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2734954. Maximum sequence length: 2049, sample length: 3792 [default0]:Skipping sample id=2730706. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2730928. Maximum sequence length: 2049, sample length: 3308 [default0]:Skipping sample id=2742847. Maximum sequence length: 2049, sample length: 3132 [default0]:Skipping sample id=2717197. Maximum sequence length: 2049, sample length: 4048 [default0]:Skipping sample id=2477916. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2730148. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2481790. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2754395. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2723304. Maximum sequence length: 2049, sample length: 4095 [default0]:Skipping sample id=2724452. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2750687. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2477335. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2727882. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2722534. Maximum sequence length: 2049, sample length: 2968 [default0]:Skipping sample id=2743513. Maximum sequence length: 2049, sample length: 6004 [default0]:Skipping sample id=2721542. Maximum sequence length: 2049, sample length: 5113 [default0]:Skipping sample id=2715242. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2727484. Maximum sequence length: 2049, sample length: 4058 [default0]:Skipping sample id=2721682. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2499171. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2734635. Maximum sequence length: 2049, sample length: 8224 [default0]:Skipping sample id=2717556. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2720538. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2732224. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2739469. Maximum sequence length: 2049, sample length: 2697 [default0]:Skipping sample id=2712811. Maximum sequence length: 2049, sample length: 2875 [default0]:Skipping sample id=2713912. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2756426. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2732228. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2742410. Maximum sequence length: 2049, sample length: 3375 [default0]:Skipping sample id=2750245. Maximum sequence length: 2049, sample length: 3407 [default0]:Skipping sample id=2740239. Maximum sequence length: 2049, sample length: 3029 [default0]:Skipping sample id=2751746. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2740502. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2718033. Maximum sequence length: 2049, sample length: 4133 [default0]:Skipping sample id=2738560. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2719422. Maximum sequence length: 2049, sample length: 2869 [default0]:Skipping sample id=2744298. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2737654. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2735306. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2736265. Maximum sequence length: 2049, sample length: 3385 [default0]:Skipping sample id=2728582. Maximum sequence length: 2049, sample length: 4618 [default0]:Skipping sample id=2754102. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2483802. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2749232. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2717286. Maximum sequence length: 2049, sample length: 3875 [default0]:Skipping sample id=2731775. Maximum sequence length: 2049, sample length: 4401 [default0]:Skipping sample id=2468032. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2735139. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2722865. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2482653. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2491248. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2487571. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2738694. Maximum sequence length: 2049, sample length: 3866 [default0]:Skipping sample id=2469471. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2715037. Maximum sequence length: 2049, sample length: 3040 [default0]:Skipping sample id=2755291. Maximum sequence length: 2049, sample length: 2625 [default0]:Skipping sample id=2716103. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2721727. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2740257. Maximum sequence length: 2049, sample length: 3295 [default0]:Skipping sample id=2731940. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2741064. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2742685. Maximum sequence length: 2049, sample length: 4607 [default0]:Skipping sample id=2489480. Maximum sequence length: 2049, sample length: 2575 [default0]:Skipping sample id=2735501. Maximum sequence length: 2049, sample length: 3437 [default0]:Skipping sample id=2728181. Maximum sequence length: 2049, sample length: 4532 [default0]:Skipping sample id=2721875. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2736936. Maximum sequence length: 2049, sample length: 3047 [default0]:Skipping sample id=2752747. Maximum sequence length: 2049, sample length: 3115 [default0]:Skipping sample id=2726714. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2744395. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2713846. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2721916. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2746885. Maximum sequence length: 2049, sample length: 3323 [default0]:Skipping sample id=2478611. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2735220. Maximum sequence length: 2049, sample length: 3945 [default0]:Skipping sample id=2715923. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2749683. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2725737. Maximum sequence length: 2049, sample length: 3472 [default0]:Skipping sample id=2728473. Maximum sequence length: 2049, sample length: 3671 [default0]:Skipping sample id=2724443. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2737870. Maximum sequence length: 2049, sample length: 2824 [default0]:Skipping sample id=2713751. Maximum sequence length: 2049, sample length: 5688 [default0]:Skipping sample id=2711282. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2737399. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2737865. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2711299. Maximum sequence length: 2049, sample length: 2968 [default0]:Skipping sample id=2742882. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2749040. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2751336. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2723631. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2751530. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2494677. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2751759. Maximum sequence length: 2049, sample length: 3334 [default0]:Skipping sample id=2742782. Maximum sequence length: 2049, sample length: 4437 [default0]:Skipping sample id=2718449. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2493305. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2720069. Maximum sequence length: 2049, sample length: 4535 [default0]:Skipping sample id=2719447. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2742398. Maximum sequence length: 2049, sample length: 3432 [default0]:Skipping sample id=2497197. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2712181. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2736901. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2732009. Maximum sequence length: 2049, sample length: 4268 [default0]:Skipping sample id=2756761. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2713234. Maximum sequence length: 2049, sample length: 4865 [default0]:Skipping sample id=2729061. Maximum sequence length: 2049, sample length: 6800 [default0]:Skipping sample id=2735693. Maximum sequence length: 2049, sample length: 2440 [default0]:Skipping sample id=2499113. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2721027. Maximum sequence length: 2049, sample length: 4539 [default0]:Skipping sample id=2713934. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2714690. Maximum sequence length: 2049, sample length: 3868 [default0]:Skipping sample id=2720474. Maximum sequence length: 2049, sample length: 2971 [default0]:Skipping sample id=2729616. Maximum sequence length: 2049, sample length: 4003 [default0]:Skipping sample id=2747048. Maximum sequence length: 2049, sample length: 5053 [default0]:Skipping sample id=2493400. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2716256. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2466287. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2735533. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2487019. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2497761. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2728864. Maximum sequence length: 2049, sample length: 3837 [default0]:Skipping sample id=2489233. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2719253. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2746254. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2740119. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2716454. Maximum sequence length: 2049, sample length: 4438 [default0]:Skipping sample id=2478536. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2740529. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2743077. Maximum sequence length: 2049, sample length: 6761 [default0]:Skipping sample id=2738411. Maximum sequence length: 2049, sample length: 2575 [default0]:Skipping sample id=2498588. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2733432. Maximum sequence length: 2049, sample length: 3146 [default0]:Skipping sample id=2488206. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2470949. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2716201. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2717230. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2471123. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2717049. Maximum sequence length: 2049, sample length: 2621 [default0]:Skipping sample id=2489386. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2732897. Maximum sequence length: 2049, sample length: 5322 [default0]:Skipping sample id=2726176. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2493750. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2720852. Maximum sequence length: 2049, sample length: 8161 [default0]:Skipping sample id=2484291. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2737921. Maximum sequence length: 2049, sample length: 2780 [default0]:Skipping sample id=2712419. Maximum sequence length: 2049, sample length: 3643 [default0]:Skipping sample id=2465995. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2743835. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2720513. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2717326. Maximum sequence length: 2049, sample length: 4155 [default0]:Skipping sample id=2748484. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2717742. Maximum sequence length: 2049, sample length: 4563 [default0]:Skipping sample id=2732918. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2717665. Maximum sequence length: 2049, sample length: 2695 [default0]:Skipping sample id=2467819. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2471010. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2483086. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2742177. Maximum sequence length: 2049, sample length: 4363 [default0]:Skipping sample id=2730330. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2712401. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2735585. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2749932. Maximum sequence length: 2049, sample length: 3098 [default0]:Skipping sample id=2720736. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2752199. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2731270. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2748253. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2735012. Maximum sequence length: 2049, sample length: 3368 [default0]:Skipping sample id=2469951. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2748365. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2721645. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2466081. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2723791. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2726173. Maximum sequence length: 2049, sample length: 4080 [default0]:Skipping sample id=2469386. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2740686. Maximum sequence length: 2049, sample length: 5009 [default0]:Skipping sample id=2746708. Maximum sequence length: 2049, sample length: 5235 [default0]:Skipping sample id=2741846. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2716516. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2736764. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2717940. Maximum sequence length: 2049, sample length: 2875 [default0]:Skipping sample id=2742290. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2754022. Maximum sequence length: 2049, sample length: 2968 [default0]:Skipping sample id=2750552. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2718458. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2716532. Maximum sequence length: 2049, sample length: 4430 [default0]:Skipping sample id=2746630. Maximum sequence length: 2049, sample length: 3327 [default0]:Skipping sample id=2740636. Maximum sequence length: 2049, sample length: 3075 [default0]:Skipping sample id=2733202. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2477910. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2753950. Maximum sequence length: 2049, sample length: 4776 [default0]:Skipping sample id=2755543. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2730168. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2722507. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2712090. Maximum sequence length: 2049, sample length: 2919 [default0]:Skipping sample id=2743669. Maximum sequence length: 2049, sample length: 2876 [default0]:Skipping sample id=2749018. Maximum sequence length: 2049, sample length: 3517 [default0]:Skipping sample id=2751439. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2738327. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2713255. Maximum sequence length: 2049, sample length: 3191 [default0]:Skipping sample id=2750028. Maximum sequence length: 2049, sample length: 4528 [default0]:Skipping sample id=2721700. Maximum sequence length: 2049, sample length: 3086 [default0]:Skipping sample id=2734800. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2713550. Maximum sequence length: 2049, sample length: 2457 [default0]:Skipping sample id=2722071. Maximum sequence length: 2049, sample length: 7290 [default0]:Skipping sample id=2489037. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2750681. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2726415. Maximum sequence length: 2049, sample length: 3013 [default0]:Skipping sample id=2752404. Maximum sequence length: 2049, sample length: 4627 [default0]:Skipping sample id=2741559. Maximum sequence length: 2049, sample length: 3626 [default0]:Skipping sample id=2741461. Maximum sequence length: 2049, sample length: 2944 [default0]:Skipping sample id=2495833. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2750779. Maximum sequence length: 2049, sample length: 4250 [default0]:Skipping sample id=2721048. Maximum sequence length: 2049, sample length: 2950 [default0]:Skipping sample id=2711068. Maximum sequence length: 2049, sample length: 4041 [default0]:Skipping sample id=2712906. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2723883. Maximum sequence length: 2049, sample length: 3912 [default0]:Skipping sample id=2716991. Maximum sequence length: 2049, sample length: 3613 [default0]:Skipping sample id=2744125. Maximum sequence length: 2049, sample length: 3576 [default0]:Skipping sample id=2755703. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2728106. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2753343. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2749843. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2746421. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2726266. Maximum sequence length: 2049, sample length: 2947 [default0]:Skipping sample id=2713566. Maximum sequence length: 2049, sample length: 2989 [default0]:Skipping sample id=2738122. Maximum sequence length: 2049, sample length: 2490 [default0]:Skipping sample id=2736986. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2721318. Maximum sequence length: 2049, sample length: 4830 [default0]:Skipping sample id=2747990. Maximum sequence length: 2049, sample length: 4223 [default0]:Skipping sample id=2750848. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2736697. Maximum sequence length: 2049, sample length: 2993 [default0]:Skipping sample id=2757021. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2736848. Maximum sequence length: 2049, sample length: 3245 [default0]:Skipping sample id=2727421. Maximum sequence length: 2049, sample length: 7261 [default0]:Skipping sample id=2491916. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2494121. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2751972. Maximum sequence length: 2049, sample length: 4814 [default0]:Skipping sample id=2727701. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2711466. Maximum sequence length: 2049, sample length: 5789 [default0]:Skipping sample id=2723368. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2737353. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2478582. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2725731. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2477227. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2732645. Maximum sequence length: 2049, sample length: 2683 [default0]:Skipping sample id=2711657. Maximum sequence length: 2049, sample length: 2886 [default0]:Skipping sample id=2739695. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2491768. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2730410. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2722886. Maximum sequence length: 2049, sample length: 5364 [default0]:Skipping sample id=2494017. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2727733. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2730973. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2715868. Maximum sequence length: 2049, sample length: 3098 [default0]:Skipping sample id=2734295. Maximum sequence length: 2049, sample length: 5111 [default0]:Skipping sample id=2719591. Maximum sequence length: 2049, sample length: 5165 [default0]:Skipping sample id=2715693. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2745177. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2735981. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2728284. Maximum sequence length: 2049, sample length: 4239 [default0]:Skipping sample id=2713366. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2756834. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2716883. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2724423. Maximum sequence length: 2049, sample length: 5329 [default0]:Skipping sample id=2721053. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2740615. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2748177. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2733404. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2730444. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2743936. Maximum sequence length: 2049, sample length: 2470 [default0]:Skipping sample id=2745524. Maximum sequence length: 2049, sample length: 3639 [default0]:Skipping sample id=2732540. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2727681. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2733463. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2734285. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2712205. Maximum sequence length: 2049, sample length: 4749 [default0]:Skipping sample id=2721271. Maximum sequence length: 2049, sample length: 2750 [default0]:Skipping sample id=2745294. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2717933. Maximum sequence length: 2049, sample length: 3218 [default0]:Skipping sample id=2481519. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2750283. Maximum sequence length: 2049, sample length: 2421 [default0]:Skipping sample id=2739245. Maximum sequence length: 2049, sample length: 4545 [default0]:Skipping sample id=2751906. Maximum sequence length: 2049, sample length: 3157 [default0]:Skipping sample id=2715290. Maximum sequence length: 2049, sample length: 2749 [default0]:Skipping sample id=2727382. Maximum sequence length: 2049, sample length: 3664 [default0]:Skipping sample id=2487432. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2720243. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2721083. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2720670. Maximum sequence length: 2049, sample length: 3318 [default0]:Skipping sample id=2731180. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2480524. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2726717. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2730995. Maximum sequence length: 2049, sample length: 3550 [default0]:Skipping sample id=2712804. Maximum sequence length: 2049, sample length: 3218 [default0]:Skipping sample id=2732712. Maximum sequence length: 2049, sample length: 3061 [default0]:Skipping sample id=2752355. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2727651. Maximum sequence length: 2049, sample length: 3939 [default0]:Skipping sample id=2749768. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2744445. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2747975. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2741325. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2712021. Maximum sequence length: 2049, sample length: 3709 [default0]:Skipping sample id=2718785. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2752365. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2734904. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2470456. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2736505. Maximum sequence length: 2049, sample length: 2881 [default0]:Skipping sample id=2745468. Maximum sequence length: 2049, sample length: 3204 [default0]:Skipping sample id=2725943. Maximum sequence length: 2049, sample length: 6352 [default0]:Skipping sample id=2743662. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2745553. Maximum sequence length: 2049, sample length: 4535 [default0]:Skipping sample id=2746221. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2483696. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2737697. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2750800. Maximum sequence length: 2049, sample length: 3923 [default0]:Skipping sample id=2484061. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2483147. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2719707. Maximum sequence length: 2049, sample length: 3778 [default0]:Skipping sample id=2471178. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2740291. Maximum sequence length: 2049, sample length: 3895 [default0]:Skipping sample id=2754913. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2742816. Maximum sequence length: 2049, sample length: 3691 [default0]:Skipping sample id=2711152. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2744862. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2482259. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2724489. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2729016. Maximum sequence length: 2049, sample length: 4492 [default0]:Skipping sample id=2731004. Maximum sequence length: 2049, sample length: 2642 [default0]:Skipping sample id=2739658. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2728901. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2754557. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2750098. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2734255. Maximum sequence length: 2049, sample length: 4117 [default0]:Skipping sample id=2732810. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2746282. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2731522. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2714387. Maximum sequence length: 2049, sample length: 2903 [default0]:Skipping sample id=2736926. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2753440. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2749295. Maximum sequence length: 2049, sample length: 3201 [default0]:Skipping sample id=2490469. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2485014. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2753237. Maximum sequence length: 2049, sample length: 4599 [default0]:Skipping sample id=2712569. Maximum sequence length: 2049, sample length: 3601 [default0]:Skipping sample id=2723064. Maximum sequence length: 2049, sample length: 4129 [default0]:Skipping sample id=2734734. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2713519. Maximum sequence length: 2049, sample length: 3996 [default0]:Skipping sample id=2726853. Maximum sequence length: 2049, sample length: 5779 [default0]:Skipping sample id=2756081. Maximum sequence length: 2049, sample length: 3661 [default0]:Skipping sample id=2754983. Maximum sequence length: 2049, sample length: 4426 [default0]:Skipping sample id=2751987. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2750843. Maximum sequence length: 2049, sample length: 4338 [default0]:Skipping sample id=2715412. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2478537. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2726858. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2716641. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2732746. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2748687. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2711778. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2724778. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2757109. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2735473. Maximum sequence length: 2049, sample length: 3811 [default0]:Skipping sample id=2755405. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2732993. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2469977. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2735403. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2717840. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2481624. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2729674. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2494392. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2741212. Maximum sequence length: 2049, sample length: 2959 [default0]:Skipping sample id=2717895. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2749632. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2754530. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2756922. Maximum sequence length: 2049, sample length: 3372 [default0]:Skipping sample id=2742578. Maximum sequence length: 2049, sample length: 4034 [default0]:Skipping sample id=2735856. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2478826. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2756900. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2731763. Maximum sequence length: 2049, sample length: 4817 [default0]:Skipping sample id=2732802. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2741005. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2737211. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2736545. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2742589. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2723808. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2478851. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2714989. Maximum sequence length: 2049, sample length: 3747 [default0]:Skipping sample id=2751306. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2721328. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2480597. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2752536. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2724870. Maximum sequence length: 2049, sample length: 3407 [default0]:Skipping sample id=2731964. Maximum sequence length: 2049, sample length: 2507 [default0]:Skipping sample id=2711331. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2713133. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2740654. Maximum sequence length: 2049, sample length: 3075 [default0]:Skipping sample id=2483738. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2725477. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2744399. Maximum sequence length: 2049, sample length: 3719 [default0]:Skipping sample id=2717454. Maximum sequence length: 2049, sample length: 3801 [default0]:Skipping sample id=2752498. Maximum sequence length: 2049, sample length: 5527 [default0]:Skipping sample id=2745614. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2485259. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2727349. Maximum sequence length: 2049, sample length: 3408 [default0]:Skipping sample id=2729321. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2736557. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2744609. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2744000. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2723852. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2720913. Maximum sequence length: 2049, sample length: 4496 [default0]:Skipping sample id=2724910. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2714444. Maximum sequence length: 2049, sample length: 3021 [default0]:Skipping sample id=2469187. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2714218. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2738331. Maximum sequence length: 2049, sample length: 2669 [default0]:Skipping sample id=2754555. Maximum sequence length: 2049, sample length: 2553 [default0]:Skipping sample id=2716765. Maximum sequence length: 2049, sample length: 3566 [default0]:Skipping sample id=2481034. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2747243. Maximum sequence length: 2049, sample length: 6495 [default0]:Skipping sample id=2737860. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2721301. Maximum sequence length: 2049, sample length: 4252 [default0]:Skipping sample id=2716038. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2747122. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2728710. Maximum sequence length: 2049, sample length: 6318 [default0]:Skipping sample id=2720302. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2749560. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2744072. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2757020. Maximum sequence length: 2049, sample length: 2489 [default0]:Skipping sample id=2499209. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2730522. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2743197. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2755632. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2716577. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2722241. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2742572. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2742019. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2732524. Maximum sequence length: 2049, sample length: 4561 [default0]:Skipping sample id=2740780. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2720961. Maximum sequence length: 2049, sample length: 3019 [default0]:Skipping sample id=2747992. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2466008. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2495121. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2730359. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2729261. Maximum sequence length: 2049, sample length: 4635 [default0]:Skipping sample id=2479258. Maximum sequence length: 2049, sample length: 2835 [default0]:Skipping sample id=2485746. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2723867. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2712193. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2725377. Maximum sequence length: 2049, sample length: 5617 [default0]:Skipping sample id=2724316. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2747409. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2721274. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2733227. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2732941. Maximum sequence length: 2049, sample length: 3159 [default0]:Skipping sample id=2752082. Maximum sequence length: 2049, sample length: 2714 [default0]:Skipping sample id=2752638. Maximum sequence length: 2049, sample length: 3252 [default0]:Skipping sample id=2740948. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2484561. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2713466. Maximum sequence length: 2049, sample length: 3642 [default0]:Skipping sample id=2753374. Maximum sequence length: 2049, sample length: 2656 [default0]:Skipping sample id=2736939. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2714050. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2726825. Maximum sequence length: 2049, sample length: 3372 [default0]:Skipping sample id=2722859. Maximum sequence length: 2049, sample length: 2697 [default0]:Skipping sample id=2744623. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2756896. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2715946. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2745464. Maximum sequence length: 2049, sample length: 3733 [default0]:Skipping sample id=2724337. Maximum sequence length: 2049, sample length: 3671 [default0]:Skipping sample id=2748032. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2713389. Maximum sequence length: 2049, sample length: 3508 [default0]:Skipping sample id=2742742. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2739943. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2714945. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2494425. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2717934. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2740594. Maximum sequence length: 2049, sample length: 2933 [default0]:Skipping sample id=2728747. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2734141. Maximum sequence length: 2049, sample length: 3895 [default0]:Skipping sample id=2716348. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2741609. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2727018. Maximum sequence length: 2049, sample length: 4178 [default0]:Skipping sample id=2728468. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2755758. Maximum sequence length: 2049, sample length: 4231 [default0]:Skipping sample id=2752409. Maximum sequence length: 2049, sample length: 3400 [default0]:Skipping sample id=2737505. Maximum sequence length: 2049, sample length: 4929 [default0]:Skipping sample id=2718205. Maximum sequence length: 2049, sample length: 4868 [default0]:Skipping sample id=2741588. Maximum sequence length: 2049, sample length: 6238 [default0]:Skipping sample id=2470463. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2712495. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2713298. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2466587. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2755889. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2732727. Maximum sequence length: 2049, sample length: 5336 [default0]:Skipping sample id=2484165. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2721020. Maximum sequence length: 2049, sample length: 3688 [default0]:Skipping sample id=2742772. Maximum sequence length: 2049, sample length: 2531 [default0]:Skipping sample id=2754092. Maximum sequence length: 2049, sample length: 3735 [default0]:Skipping sample id=2744158. Maximum sequence length: 2049, sample length: 4402 [default0]:Skipping sample id=2747061. Maximum sequence length: 2049, sample length: 4706 [default0]:Skipping sample id=2744300. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2729043. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2496268. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2737397. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2750501. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2714434. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2730081. Maximum sequence length: 2049, sample length: 3293 [default0]:Skipping sample id=2713914. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2732461. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2721240. Maximum sequence length: 2049, sample length: 3410 [default0]:Skipping sample id=2718569. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2721423. Maximum sequence length: 2049, sample length: 5833 [default0]:Skipping sample id=2718760. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2720555. Maximum sequence length: 2049, sample length: 4157 [default0]:Skipping sample id=2735539. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2745475. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2735036. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2729129. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2712883. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2720169. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2739764. Maximum sequence length: 2049, sample length: 8121 [default0]:Skipping sample id=2730698. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2729864. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2488509. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2729105. Maximum sequence length: 2049, sample length: 4603 [default0]:Skipping sample id=2718466. Maximum sequence length: 2049, sample length: 5033 [default0]:Skipping sample id=2723887. Maximum sequence length: 2049, sample length: 4509 [default0]:Skipping sample id=2729404. Maximum sequence length: 2049, sample length: 3254 [default0]:Skipping sample id=2730325. Maximum sequence length: 2049, sample length: 2910 [default0]:Skipping sample id=2749812. Maximum sequence length: 2049, sample length: 3463 [default0]:Skipping sample id=2728248. Maximum sequence length: 2049, sample length: 3127 [default0]:Skipping sample id=2734579. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2744467. Maximum sequence length: 2049, sample length: 4799 [default0]:Skipping sample id=2721753. Maximum sequence length: 2049, sample length: 4111 [default0]:Skipping sample id=2752375. Maximum sequence length: 2049, sample length: 3630 [default0]:Skipping sample id=2734647. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2719399. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2718988. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2753762. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2724051. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2716728. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2727487. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2739712. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2487692. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2747609. Maximum sequence length: 2049, sample length: 3167 [default0]:Skipping sample id=2742999. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2737583. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2737666. Maximum sequence length: 2049, sample length: 3546 [default0]:Skipping sample id=2729802. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2729109. Maximum sequence length: 2049, sample length: 4691 [default0]:Skipping sample id=2747422. Maximum sequence length: 2049, sample length: 3367 [default0]:Skipping sample id=2711665. Maximum sequence length: 2049, sample length: 2954 [default0]:Skipping sample id=2728208. Maximum sequence length: 2049, sample length: 4004 [default0]:Skipping sample id=2752970. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2750295. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2753361. Maximum sequence length: 2049, sample length: 2456 [default0]:Skipping sample id=2718695. Maximum sequence length: 2049, sample length: 2989 [default0]:Skipping sample id=2738706. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2487927. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2749280. Maximum sequence length: 2049, sample length: 3918 [default0]:Skipping sample id=2715292. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2728011. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2489463. Maximum sequence length: 2049, sample length: 2857 [default0]:Skipping sample id=2734936. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2723923. Maximum sequence length: 2049, sample length: 3643 [default0]:Skipping sample id=2493828. Maximum sequence length: 2049, sample length: 3113 [default0]:Skipping sample id=2480641. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2486045. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2724673. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2732482. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2723503. Maximum sequence length: 2049, sample length: 3541 [default0]:Skipping sample id=2742117. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2753079. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2736581. Maximum sequence length: 2049, sample length: 5977 [default0]:Skipping sample id=2728031. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2713642. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2754904. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2467724. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2479201. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2724130. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2726909. Maximum sequence length: 2049, sample length: 5184 [default0]:Skipping sample id=2737607. Maximum sequence length: 2049, sample length: 3156 [default0]:Skipping sample id=2731452. Maximum sequence length: 2049, sample length: 5978 [default0]:Skipping sample id=2731847. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2729639. Maximum sequence length: 2049, sample length: 4527 [default0]:Skipping sample id=2731846. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2739429. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2485896. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2742466. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2736713. Maximum sequence length: 2049, sample length: 3796 [default0]:Skipping sample id=2719913. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2732122. Maximum sequence length: 2049, sample length: 3320 [default0]:Skipping sample id=2719029. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2723162. Maximum sequence length: 2049, sample length: 4140 [default0]:Skipping sample id=2749837. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2741188. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2492344. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2746988. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2753775. Maximum sequence length: 2049, sample length: 2553 [default0]:Skipping sample id=2741032. Maximum sequence length: 2049, sample length: 3231 [default0]:Skipping sample id=2751965. Maximum sequence length: 2049, sample length: 3084 [default0]:Skipping sample id=2741827. Maximum sequence length: 2049, sample length: 3566 [default0]:Skipping sample id=2723632. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2748789. Maximum sequence length: 2049, sample length: 3516 [default0]:Skipping sample id=2727784. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2711376. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2748377. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2740987. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2720118. Maximum sequence length: 2049, sample length: 3747 [default0]:Skipping sample id=2717518. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2743918. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2735756. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2751203. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2723795. Maximum sequence length: 2049, sample length: 2369 [default0]:Skipping sample id=2744301. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2745029. Maximum sequence length: 2049, sample length: 4089 [default0]:Skipping sample id=2720063. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2724644. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2722693. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2489033. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2741153. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2722370. Maximum sequence length: 2049, sample length: 5150 [default0]:Skipping sample id=2731917. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2734522. Maximum sequence length: 2049, sample length: 4138 [default0]:Skipping sample id=2712003. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2746365. Maximum sequence length: 2049, sample length: 4167 [default0]:Skipping sample id=2732568. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2733399. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2747148. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2465937. Maximum sequence length: 2049, sample length: 3600 [default0]:Skipping sample id=2747983. Maximum sequence length: 2049, sample length: 3088 [default0]:Skipping sample id=2716927. Maximum sequence length: 2049, sample length: 2836 [default0]:Skipping sample id=2487639. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2717894. Maximum sequence length: 2049, sample length: 3637 [default0]:Skipping sample id=2724499. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2753876. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2722358. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2719229. Maximum sequence length: 2049, sample length: 3887 [default0]:Skipping sample id=2726789. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2748439. Maximum sequence length: 2049, sample length: 2616 [default0]:Skipping sample id=2482319. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2715156. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2754147. Maximum sequence length: 2049, sample length: 3514 [default0]:Skipping sample id=2747791. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2741901. Maximum sequence length: 2049, sample length: 3186 [default0]:Skipping sample id=2714742. Maximum sequence length: 2049, sample length: 2712 [default0]:Skipping sample id=2716984. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2714153. Maximum sequence length: 2049, sample length: 2946 [default0]:Skipping sample id=2725042. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2466770. Maximum sequence length: 2049, sample length: 3265 [default0]:Skipping sample id=2727721. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2493314. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2477297. Maximum sequence length: 2049, sample length: 2671 [default0]:Skipping sample id=2734992. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2747902. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2737948. Maximum sequence length: 2049, sample length: 6760 [default0]:Skipping sample id=2737306. Maximum sequence length: 2049, sample length: 3220 [default0]:Skipping sample id=2756918. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2729272. Maximum sequence length: 2049, sample length: 4142 [default0]:Skipping sample id=2481109. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2728319. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2732084. Maximum sequence length: 2049, sample length: 3859 [default0]:Skipping sample id=2713495. Maximum sequence length: 2049, sample length: 3759 [default0]:Skipping sample id=2734792. Maximum sequence length: 2049, sample length: 3258 [default0]:Skipping sample id=2469777. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2754661. Maximum sequence length: 2049, sample length: 4381 [default0]:Skipping sample id=2712350. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2737359. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2739537. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2742871. Maximum sequence length: 2049, sample length: 3932 [default0]:Skipping sample id=2721760. Maximum sequence length: 2049, sample length: 2679 [default0]:Skipping sample id=2480274. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2711482. Maximum sequence length: 2049, sample length: 3462 [default0]:Skipping sample id=2727788. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2493516. Maximum sequence length: 2049, sample length: 2904 [default0]:Skipping sample id=2727734. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2722085. Maximum sequence length: 2049, sample length: 3492 [default0]:Skipping sample id=2714536. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2747746. Maximum sequence length: 2049, sample length: 3694 [default0]:Skipping sample id=2720689. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2743975. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2722832. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2740616. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2747632. Maximum sequence length: 2049, sample length: 3433 [default0]:Skipping sample id=2470825. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2713665. Maximum sequence length: 2049, sample length: 3636 [default0]:Skipping sample id=2743080. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2719043. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2737974. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2753851. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2727416. Maximum sequence length: 2049, sample length: 3395 [default0]:Skipping sample id=2750321. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2738311. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2731757. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2721072. Maximum sequence length: 2049, sample length: 6524 [default0]:Skipping sample id=2720831. Maximum sequence length: 2049, sample length: 3559 [default0]:Skipping sample id=2742230. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2739409. Maximum sequence length: 2049, sample length: 5609 [default0]:Skipping sample id=2745250. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2745033. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2731030. Maximum sequence length: 2049, sample length: 3416 [default0]:Skipping sample id=2728490. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2750863. Maximum sequence length: 2049, sample length: 3491 [default0]:Skipping sample id=2728558. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2715878. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2486146. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2483622. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2756460. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2730907. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2713117. Maximum sequence length: 2049, sample length: 6543 [default0]:Skipping sample id=2751166. Maximum sequence length: 2049, sample length: 3525 [default0]:Skipping sample id=2713658. Maximum sequence length: 2049, sample length: 3497 [default0]:Skipping sample id=2753192. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2748133. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2733006. Maximum sequence length: 2049, sample length: 3564 [default0]:Skipping sample id=2736892. Maximum sequence length: 2049, sample length: 4521 [default0]:Skipping sample id=2752419. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2743018. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2739663. Maximum sequence length: 2049, sample length: 3062 [default0]:Skipping sample id=2722584. Maximum sequence length: 2049, sample length: 3210 [default0]:Skipping sample id=2734429. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2745419. Maximum sequence length: 2049, sample length: 2555 [default0]:Skipping sample id=2715637. Maximum sequence length: 2049, sample length: 3173 [default0]:Skipping sample id=2742842. Maximum sequence length: 2049, sample length: 3927 [default0]:Skipping sample id=2712601. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2711291. Maximum sequence length: 2049, sample length: 3280 [default0]:Skipping sample id=2715879. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2751815. Maximum sequence length: 2049, sample length: 2922 [default0]:Skipping sample id=2739555. Maximum sequence length: 2049, sample length: 3999 [default0]:Skipping sample id=2739011. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2750973. Maximum sequence length: 2049, sample length: 2434 [default0]:Skipping sample id=2483202. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2739389. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2748142. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2741368. Maximum sequence length: 2049, sample length: 8032 [default0]:Skipping sample id=2733182. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2484126. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2730084. Maximum sequence length: 2049, sample length: 4560 [default0]:Skipping sample id=2723062. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2718329. Maximum sequence length: 2049, sample length: 3065 [default0]:Skipping sample id=2722324. Maximum sequence length: 2049, sample length: 6445 [default0]:Skipping sample id=2724582. Maximum sequence length: 2049, sample length: 4193 [default0]:Skipping sample id=2732342. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2722771. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2755951. Maximum sequence length: 2049, sample length: 3894 [default0]:Skipping sample id=2716763. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2733710. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2711016. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2724925. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2726710. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2721950. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2731226. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2746476. Maximum sequence length: 2049, sample length: 3366 [default0]:Skipping sample id=2479033. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2483875. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2721674. Maximum sequence length: 2049, sample length: 3792 [default0]:Skipping sample id=2724167. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2489672. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2719921. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2722564. Maximum sequence length: 2049, sample length: 2430 [default0]:Skipping sample id=2726962. Maximum sequence length: 2049, sample length: 2966 [default0]:Skipping sample id=2745731. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2728614. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2726669. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2727817. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2714067. Maximum sequence length: 2049, sample length: 2748 [default0]:Skipping sample id=2495131. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2751187. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2465914. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2482435. Maximum sequence length: 2049, sample length: 2906 [default0]:Skipping sample id=2730161. Maximum sequence length: 2049, sample length: 3418 [default0]:Skipping sample id=2723675. Maximum sequence length: 2049, sample length: 2910 [default0]:Skipping sample id=2755121. Maximum sequence length: 2049, sample length: 5650 [default0]:Skipping sample id=2730077. Maximum sequence length: 2049, sample length: 3858 [default0]:Skipping sample id=2757009. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2719907. Maximum sequence length: 2049, sample length: 3254 [default0]:Skipping sample id=2710987. Maximum sequence length: 2049, sample length: 4320 [default0]:Skipping sample id=2715346. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2718631. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2751867. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2724843. Maximum sequence length: 2049, sample length: 3006 [default0]:Skipping sample id=2729853. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2727810. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2717609. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2754097. Maximum sequence length: 2049, sample length: 2803 [default0]:Skipping sample id=2729862. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2751903. Maximum sequence length: 2049, sample length: 3849 [default0]:Skipping sample id=2736093. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2711862. Maximum sequence length: 2049, sample length: 3613 [default0]:Skipping sample id=2756004. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2745632. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2720035. Maximum sequence length: 2049, sample length: 2529 [default0]:Skipping sample id=2737292. Maximum sequence length: 2049, sample length: 2746 [default0]:Skipping sample id=2737017. Maximum sequence length: 2049, sample length: 4619 [default0]:Skipping sample id=2748089. Maximum sequence length: 2049, sample length: 7077 [default0]:Skipping sample id=2727476. Maximum sequence length: 2049, sample length: 2594 [default0]:Skipping sample id=2711259. Maximum sequence length: 2049, sample length: 4411 [default0]:Skipping sample id=2495899. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2731497. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2752098. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2748700. Maximum sequence length: 2049, sample length: 4774 [default0]:Skipping sample id=2478825. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2713259. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2742698. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2478469. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2483972. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2747826. Maximum sequence length: 2049, sample length: 3956 [default0]:Skipping sample id=2741593. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2490684. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2720306. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2749582. Maximum sequence length: 2049, sample length: 3043 [default0]:Skipping sample id=2736261. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2737538. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2749315. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2494952. Maximum sequence length: 2049, sample length: 3892 [default0]:Skipping sample id=2733900. Maximum sequence length: 2049, sample length: 4463 [default0]:Skipping sample id=2493013. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2723381. Maximum sequence length: 2049, sample length: 3462 [default0]:Skipping sample id=2729680. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2711918. Maximum sequence length: 2049, sample length: 3881 [default0]:Skipping sample id=2754269. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2711908. Maximum sequence length: 2049, sample length: 4463 [default0]:Skipping sample id=2740459. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2735850. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2752786. Maximum sequence length: 2049, sample length: 2602 [default0]:Skipping sample id=2746066. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2736519. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2739749. Maximum sequence length: 2049, sample length: 4466 [default0]:Skipping sample id=2736105. Maximum sequence length: 2049, sample length: 4726 [default0]:Skipping sample id=2740697. Maximum sequence length: 2049, sample length: 3502 [default0]:Skipping sample id=2734195. Maximum sequence length: 2049, sample length: 3136 [default0]:Skipping sample id=2753056. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2744029. Maximum sequence length: 2049, sample length: 3062 [default0]:Skipping sample id=2725859. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2723096. Maximum sequence length: 2049, sample length: 3491 [default0]:Skipping sample id=2751024. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2467091. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2731719. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2724359. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2717572. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2733028. Maximum sequence length: 2049, sample length: 4377 [default0]:Skipping sample id=2726620. Maximum sequence length: 2049, sample length: 2741 [default0]:Skipping sample id=2718767. Maximum sequence length: 2049, sample length: 2973 [default0]:Skipping sample id=2740153. Maximum sequence length: 2049, sample length: 4288 [default0]:Skipping sample id=2712182. Maximum sequence length: 2049, sample length: 4090 [default0]:Skipping sample id=2751541. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2730280. Maximum sequence length: 2049, sample length: 6616 [default0]:Skipping sample id=2756722. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2734267. Maximum sequence length: 2049, sample length: 5678 [default0]:Skipping sample id=2720562. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2718384. Maximum sequence length: 2049, sample length: 3191 [default0]:Skipping sample id=2730950. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2711405. Maximum sequence length: 2049, sample length: 6073 [default0]:Skipping sample id=2714572. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2742445. Maximum sequence length: 2049, sample length: 2942 [default0]:Skipping sample id=2719640. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2753427. Maximum sequence length: 2049, sample length: 3045 [default0]:Skipping sample id=2753457. Maximum sequence length: 2049, sample length: 3025 [default0]:Skipping sample id=2713032. Maximum sequence length: 2049, sample length: 3531 [default0]:Skipping sample id=2756314. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2732168. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2719348. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2755123. Maximum sequence length: 2049, sample length: 5081 [default0]:Skipping sample id=2716343. Maximum sequence length: 2049, sample length: 2937 [default0]:Skipping sample id=2737746. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2724919. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2724230. Maximum sequence length: 2049, sample length: 3789 [default0]:Skipping sample id=2717920. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2721195. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2729135. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2730687. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2722087. Maximum sequence length: 2049, sample length: 3128 [default0]:Skipping sample id=2483177. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2752693. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2745379. Maximum sequence length: 2049, sample length: 2938 [default0]:Skipping sample id=2731674. Maximum sequence length: 2049, sample length: 4900 [default0]:Skipping sample id=2751368. Maximum sequence length: 2049, sample length: 3232 [default0]:Skipping sample id=2756175. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2756301. Maximum sequence length: 2049, sample length: 3742 [default0]:Skipping sample id=2730284. Maximum sequence length: 2049, sample length: 4145 [default0]:Skipping sample id=2722461. Maximum sequence length: 2049, sample length: 5950 [default0]:Skipping sample id=2753055. Maximum sequence length: 2049, sample length: 3081 [default0]:Skipping sample id=2745037. Maximum sequence length: 2049, sample length: 6556 [default0]:Skipping sample id=2723822. Maximum sequence length: 2049, sample length: 4208 [default0]:Skipping sample id=2724318. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2756358. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2743251. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2750306. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2717929. Maximum sequence length: 2049, sample length: 3187 [default0]:Skipping sample id=2756938. Maximum sequence length: 2049, sample length: 3382 [default0]:Skipping sample id=2489682. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2720840. Maximum sequence length: 2049, sample length: 4211 [default0]:Skipping sample id=2724319. Maximum sequence length: 2049, sample length: 4043 [default0]:Skipping sample id=2730506. Maximum sequence length: 2049, sample length: 5600 [default0]:Skipping sample id=2466699. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2730324. Maximum sequence length: 2049, sample length: 3714 [default0]:Skipping sample id=2726654. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2736258. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2467859. Maximum sequence length: 2049, sample length: 3470 [default0]:Skipping sample id=2712158. Maximum sequence length: 2049, sample length: 3806 [default0]:Skipping sample id=2727204. Maximum sequence length: 2049, sample length: 3899 [default0]:Skipping sample id=2722348. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2744217. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2730220. Maximum sequence length: 2049, sample length: 2468 [default0]:Skipping sample id=2493587. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2720120. Maximum sequence length: 2049, sample length: 3235 [default0]:Skipping sample id=2755640. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2744992. Maximum sequence length: 2049, sample length: 2623 [default0]:Skipping sample id=2753121. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2744676. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2727377. Maximum sequence length: 2049, sample length: 3179 [default0]:Skipping sample id=2466642. Maximum sequence length: 2049, sample length: 2777 [default0]:Skipping sample id=2717393. Maximum sequence length: 2049, sample length: 3949 [default0]:Skipping sample id=2726969. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2742449. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2723880. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2717287. Maximum sequence length: 2049, sample length: 3806 [default0]:Skipping sample id=2751883. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2732160. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2736175. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2737164. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2742001. Maximum sequence length: 2049, sample length: 3738 [default0]:Skipping sample id=2488917. Maximum sequence length: 2049, sample length: 2842 [default0]:Skipping sample id=2484896. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2719099. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2725691. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2746728. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2743902. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2746748. Maximum sequence length: 2049, sample length: 2712 [default0]:Skipping sample id=2746234. Maximum sequence length: 2049, sample length: 4081 [default0]:Skipping sample id=2753338. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2746903. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2755197. Maximum sequence length: 2049, sample length: 5172 [default0]:Skipping sample id=2746948. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2729891. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2722583. Maximum sequence length: 2049, sample length: 3808 [default0]:Skipping sample id=2722837. Maximum sequence length: 2049, sample length: 2490 [default0]:Skipping sample id=2726051. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2738549. Maximum sequence length: 2049, sample length: 3950 [default0]:Skipping sample id=2742591. Maximum sequence length: 2049, sample length: 3745 [default0]:Skipping sample id=2740579. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2723713. Maximum sequence length: 2049, sample length: 5724 [default0]:Skipping sample id=2734265. Maximum sequence length: 2049, sample length: 3581 [default0]:Skipping sample id=2730736. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2752505. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2718664. Maximum sequence length: 2049, sample length: 4168 [default0]:Skipping sample id=2749400. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2750940. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2751565. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2736515. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2745994. Maximum sequence length: 2049, sample length: 5861 [default0]:Skipping sample id=2747121. Maximum sequence length: 2049, sample length: 2829 [default0]:Skipping sample id=2492724. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2739926. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2729215. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2492291. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2751546. Maximum sequence length: 2049, sample length: 3270 [default0]:Skipping sample id=2738242. Maximum sequence length: 2049, sample length: 3154 [default0]:Skipping sample id=2720968. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2720824. Maximum sequence length: 2049, sample length: 3960 [default0]:Skipping sample id=2731136. Maximum sequence length: 2049, sample length: 3386 [default0]:Skipping sample id=2726112. Maximum sequence length: 2049, sample length: 2838 [default0]:Skipping sample id=2743969. Maximum sequence length: 2049, sample length: 4502 [default0]:Skipping sample id=2735104. Maximum sequence length: 2049, sample length: 3332 [default0]:Skipping sample id=2487814. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2737047. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2469876. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2714600. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2738958. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2747845. Maximum sequence length: 2049, sample length: 3122 [default0]:Skipping sample id=2730594. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2729630. Maximum sequence length: 2049, sample length: 5197 [default0]:Skipping sample id=2728595. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2730690. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2727912. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2493241. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2721091. Maximum sequence length: 2049, sample length: 3249 [default0]:Skipping sample id=2740186. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2465792. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2750962. Maximum sequence length: 2049, sample length: 4967 [default0]:Skipping sample id=2754165. Maximum sequence length: 2049, sample length: 2369 [default0]:Skipping sample id=2743131. Maximum sequence length: 2049, sample length: 3119 [default0]:Skipping sample id=2751756. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2713269. Maximum sequence length: 2049, sample length: 3481 [default0]:Skipping sample id=2732187. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2713136. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2739143. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2712562. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2753433. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2754155. Maximum sequence length: 2049, sample length: 3302 [default0]:Skipping sample id=2720697. Maximum sequence length: 2049, sample length: 4174 [default0]:Skipping sample id=2710995. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2752159. Maximum sequence length: 2049, sample length: 3192 [default0]:Skipping sample id=2756366. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2746623. Maximum sequence length: 2049, sample length: 2708 [default0]:Skipping sample id=2734495. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2745035. Maximum sequence length: 2049, sample length: 2972 [default0]:Skipping sample id=2468814. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2729265. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2751846. Maximum sequence length: 2049, sample length: 6246 [default0]:Skipping sample id=2717471. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2478493. Maximum sequence length: 2049, sample length: 3270 [default0]:Skipping sample id=2482064. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2749103. Maximum sequence length: 2049, sample length: 2816 [default0]:Skipping sample id=2713979. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2721261. Maximum sequence length: 2049, sample length: 3810 [default0]:Skipping sample id=2716917. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2716469. Maximum sequence length: 2049, sample length: 6661 [default0]:Skipping sample id=2736435. Maximum sequence length: 2049, sample length: 2939 [default0]:Skipping sample id=2717856. Maximum sequence length: 2049, sample length: 2861 [default0]:Skipping sample id=2742735. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2731921. Maximum sequence length: 2049, sample length: 6063 [default0]:Skipping sample id=2723055. Maximum sequence length: 2049, sample length: 4914 [default0]:Skipping sample id=2745857. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2728893. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2723196. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2752048. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2482604. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2713806. Maximum sequence length: 2049, sample length: 3170 [default0]:Skipping sample id=2732511. Maximum sequence length: 2049, sample length: 3664 [default0]:Skipping sample id=2714697. Maximum sequence length: 2049, sample length: 4570 [default0]:Skipping sample id=2747402. Maximum sequence length: 2049, sample length: 3205 [default0]:Skipping sample id=2753580. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2725050. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2749922. Maximum sequence length: 2049, sample length: 3763 [default0]:Skipping sample id=2755693. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2713718. Maximum sequence length: 2049, sample length: 4613 [default0]:Skipping sample id=2728587. Maximum sequence length: 2049, sample length: 3055 [default0]:Skipping sample id=2496057. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2733199. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2485679. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2750160. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2751968. Maximum sequence length: 2049, sample length: 5873 [default0]:Skipping sample id=2480331. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2744624. Maximum sequence length: 2049, sample length: 2958 [default0]:Skipping sample id=2719290. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2728600. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2740914. Maximum sequence length: 2049, sample length: 3736 [default0]:Skipping sample id=2728193. Maximum sequence length: 2049, sample length: 4112 [default0]:Skipping sample id=2479644. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2754760. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2730920. Maximum sequence length: 2049, sample length: 5136 [default0]:Skipping sample id=2747665. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2733506. Maximum sequence length: 2049, sample length: 3028 [default0]:Skipping sample id=2732150. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2732424. Maximum sequence length: 2049, sample length: 4453 [default0]:Skipping sample id=2728968. Maximum sequence length: 2049, sample length: 4232 [default0]:Skipping sample id=2734404. Maximum sequence length: 2049, sample length: 2567 [default0]:Skipping sample id=2483564. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2728154. Maximum sequence length: 2049, sample length: 4148 [default0]:Skipping sample id=2723756. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2719615. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2737675. Maximum sequence length: 2049, sample length: 14223 [default0]:Skipping sample id=2743558. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2488482. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2487848. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2715440. Maximum sequence length: 2049, sample length: 3752 [default0]:Skipping sample id=2488365. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2494573. Maximum sequence length: 2049, sample length: 2824 [default0]:Skipping sample id=2724881. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2753430. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2494343. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2490918. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2731569. Maximum sequence length: 2049, sample length: 3768 [default0]:Skipping sample id=2716792. Maximum sequence length: 2049, sample length: 4084 [default0]:Skipping sample id=2740536. Maximum sequence length: 2049, sample length: 4573 [default0]:Skipping sample id=2733775. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2485024. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2491146. Maximum sequence length: 2049, sample length: 2756 [default0]:Skipping sample id=2723249. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2749362. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2719782. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2737121. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2735565. Maximum sequence length: 2049, sample length: 3825 [default0]:Skipping sample id=2726000. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2722532. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2735255. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2719746. Maximum sequence length: 2049, sample length: 6614 [default0]:Skipping sample id=2468007. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2734053. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2749984. Maximum sequence length: 2049, sample length: 3147 [default0]:Skipping sample id=2723635. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2721715. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2728069. Maximum sequence length: 2049, sample length: 2585 [default0]:Skipping sample id=2745705. Maximum sequence length: 2049, sample length: 3261 [default0]:Skipping sample id=2743468. Maximum sequence length: 2049, sample length: 4073 [default0]:Skipping sample id=2726430. Maximum sequence length: 2049, sample length: 4235 [default0]:Skipping sample id=2723790. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2756235. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2747051. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2741054. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2728463. Maximum sequence length: 2049, sample length: 4858 [default0]:Skipping sample id=2754991. Maximum sequence length: 2049, sample length: 3052 [default0]:Skipping sample id=2745671. Maximum sequence length: 2049, sample length: 3916 [default0]:Skipping sample id=2732005. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2729499. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2714043. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2718444. Maximum sequence length: 2049, sample length: 3164 [default0]:Skipping sample id=2484452. Maximum sequence length: 2049, sample length: 2888 [default0]:Skipping sample id=2756441. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2730764. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2719577. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2721507. Maximum sequence length: 2049, sample length: 5208 [default0]:Skipping sample id=2746836. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2726550. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2724269. Maximum sequence length: 2049, sample length: 4377 [default0]:Skipping sample id=2748660. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2714645. Maximum sequence length: 2049, sample length: 2623 [default0]:Skipping sample id=2716545. Maximum sequence length: 2049, sample length: 3785 [default0]:Skipping sample id=2738151. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2735689. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2724735. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2746070. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2734930. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2752102. Maximum sequence length: 2049, sample length: 2507 [default0]:Skipping sample id=2718577. Maximum sequence length: 2049, sample length: 3369 [default0]:Skipping sample id=2727333. Maximum sequence length: 2049, sample length: 4834 [default0]:Skipping sample id=2720766. Maximum sequence length: 2049, sample length: 3173 [default0]:Skipping sample id=2714943. Maximum sequence length: 2049, sample length: 3508 [default0]:Skipping sample id=2733371. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2712800. Maximum sequence length: 2049, sample length: 3187 [default0]:Skipping sample id=2734597. Maximum sequence length: 2049, sample length: 3249 [default0]:Skipping sample id=2722618. Maximum sequence length: 2049, sample length: 4984 [default0]:Skipping sample id=2470605. Maximum sequence length: 2049, sample length: 3039 [default0]:Skipping sample id=2736878. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2755637. Maximum sequence length: 2049, sample length: 2944 [default0]:Skipping sample id=2735456. Maximum sequence length: 2049, sample length: 4525 [default0]:Skipping sample id=2751273. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2718402. Maximum sequence length: 2049, sample length: 4095 [default0]:Skipping sample id=2483329. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2725198. Maximum sequence length: 2049, sample length: 4542 [default0]:Skipping sample id=2729827. Maximum sequence length: 2049, sample length: 6853 [default0]:Skipping sample id=2490926. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2754078. Maximum sequence length: 2049, sample length: 3951 [default0]:Skipping sample id=2725821. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2716800. Maximum sequence length: 2049, sample length: 3113 [default0]:Skipping sample id=2732607. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2711215. Maximum sequence length: 2049, sample length: 7329 [default0]:Skipping sample id=2753074. Maximum sequence length: 2049, sample length: 3696 [default0]:Skipping sample id=2722944. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2737843. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2742000. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2714362. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2744902. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2741239. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2717141. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2753458. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2492868. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2747336. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2735561. Maximum sequence length: 2049, sample length: 4579 [default0]:Skipping sample id=2723428. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2731453. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2735356. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2745750. Maximum sequence length: 2049, sample length: 4467 [default0]:Skipping sample id=2716505. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2735140. Maximum sequence length: 2049, sample length: 4314 [default0]:Skipping sample id=2468858. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2715904. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2711059. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2481983. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2470650. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2719670. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2711084. Maximum sequence length: 2049, sample length: 4069 [default0]:Skipping sample id=2725019. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2746808. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2727584. Maximum sequence length: 2049, sample length: 4145 [default0]:Skipping sample id=2719597. Maximum sequence length: 2049, sample length: 3651 [default0]:Skipping sample id=2711618. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2711546. Maximum sequence length: 2049, sample length: 3597 [default0]:Skipping sample id=2744477. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2750173. Maximum sequence length: 2049, sample length: 2507 [default0]:Skipping sample id=2749418. Maximum sequence length: 2049, sample length: 5110 [default0]:Skipping sample id=2753002. Maximum sequence length: 2049, sample length: 3923 [default0]:Skipping sample id=2747924. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2495080. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2726473. Maximum sequence length: 2049, sample length: 3568 [default0]:Skipping sample id=2755385. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2714361. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2733165. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2725767. Maximum sequence length: 2049, sample length: 5170 [default0]:Skipping sample id=2729447. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2723673. Maximum sequence length: 2049, sample length: 3242 [default0]:Skipping sample id=2484760. Maximum sequence length: 2049, sample length: 2919 [default0]:Skipping sample id=2746304. Maximum sequence length: 2049, sample length: 3232 [default0]:Skipping sample id=2752303. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2734640. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2754029. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2740328. Maximum sequence length: 2049, sample length: 4013 [default0]:Skipping sample id=2489118. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2727005. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2741184. Maximum sequence length: 2049, sample length: 3641 [default0]:Skipping sample id=2732842. Maximum sequence length: 2049, sample length: 3193 [default0]:Skipping sample id=2481288. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2739798. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2728528. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2726286. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2721237. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2718273. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2744008. Maximum sequence length: 2049, sample length: 4789 [default0]:Skipping sample id=2739604. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2727618. Maximum sequence length: 2049, sample length: 5507 [default0]:Skipping sample id=2721612. Maximum sequence length: 2049, sample length: 3805 [default0]:Skipping sample id=2723394. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2742273. Maximum sequence length: 2049, sample length: 3197 [default0]:Skipping sample id=2757036. Maximum sequence length: 2049, sample length: 4076 [default0]:Skipping sample id=2730718. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2747070. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2467291. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2757007. Maximum sequence length: 2049, sample length: 2593 [default0]:Skipping sample id=2750374. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2489778. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2466843. Maximum sequence length: 2049, sample length: 3894 [default0]:Skipping sample id=2744905. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2737375. Maximum sequence length: 2049, sample length: 5078 [default0]:Skipping sample id=2744148. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2739378. Maximum sequence length: 2049, sample length: 3330 [default0]:Skipping sample id=2488606. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2729091. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2483452. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2732304. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2734759. Maximum sequence length: 2049, sample length: 5197 [default0]:Skipping sample id=2715381. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2714496. Maximum sequence length: 2049, sample length: 3378 [default0]:Skipping sample id=2742101. Maximum sequence length: 2049, sample length: 2455 [default0]:Skipping sample id=2733614. Maximum sequence length: 2049, sample length: 3044 [default0]:Skipping sample id=2713654. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2745536. Maximum sequence length: 2049, sample length: 2796 [default0]:Skipping sample id=2486624. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2751824. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2486171. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2756996. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2725015. Maximum sequence length: 2049, sample length: 3836 [default0]:Skipping sample id=2743204. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2754064. Maximum sequence length: 2049, sample length: 3436 [default0]:Skipping sample id=2751466. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2753770. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2742254. Maximum sequence length: 2049, sample length: 4078 [default0]:Skipping sample id=2742781. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2754081. Maximum sequence length: 2049, sample length: 3872 [default0]:Skipping sample id=2737443. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2733154. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2719067. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2498703. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2738150. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2732596. Maximum sequence length: 2049, sample length: 3228 [default0]:Skipping sample id=2749749. Maximum sequence length: 2049, sample length: 2945 [default0]:Skipping sample id=2716172. Maximum sequence length: 2049, sample length: 4245 [default0]:Skipping sample id=2735288. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2713682. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2755075. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2711892. Maximum sequence length: 2049, sample length: 3099 [default0]:Skipping sample id=2732429. Maximum sequence length: 2049, sample length: 3015 [default0]:Skipping sample id=2714539. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2738820. Maximum sequence length: 2049, sample length: 5535 [default0]:Skipping sample id=2723686. Maximum sequence length: 2049, sample length: 3506 [default0]:Skipping sample id=2494230. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2724627. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2748582. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2750332. Maximum sequence length: 2049, sample length: 6247 [default0]:Skipping sample id=2732345. Maximum sequence length: 2049, sample length: 3110 [default0]:Skipping sample id=2730274. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2754706. Maximum sequence length: 2049, sample length: 6635 [default0]:Skipping sample id=2751963. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2752250. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2722151. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2712179. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2744403. Maximum sequence length: 2049, sample length: 4375 [default0]:Skipping sample id=2743618. Maximum sequence length: 2049, sample length: 2881 [default0]:Skipping sample id=2499119. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2722033. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2721035. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2711658. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2729194. Maximum sequence length: 2049, sample length: 4284 [default0]:Skipping sample id=2752357. Maximum sequence length: 2049, sample length: 4169 [default0]:Skipping sample id=2714101. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2490760. Maximum sequence length: 2049, sample length: 2580 [default0]:Skipping sample id=2747212. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2749516. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2483383. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2735361. Maximum sequence length: 2049, sample length: 3812 [default0]:Skipping sample id=2739142. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2711855. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2738827. Maximum sequence length: 2049, sample length: 3838 [default0]:Skipping sample id=2756936. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2718742. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2743193. Maximum sequence length: 2049, sample length: 3558 [default0]:Skipping sample id=2726179. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2738759. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2745951. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2733507. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2469119. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2753133. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2498488. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2717353. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2729405. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2722578. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2723003. Maximum sequence length: 2049, sample length: 3413 [default0]:Skipping sample id=2716215. Maximum sequence length: 2049, sample length: 3578 [default0]:Skipping sample id=2716506. Maximum sequence length: 2049, sample length: 3491 [default0]:Skipping sample id=2755993. Maximum sequence length: 2049, sample length: 4598 [default0]:Skipping sample id=2478946. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2718414. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2746317. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2739396. Maximum sequence length: 2049, sample length: 5552 [default0]:Skipping sample id=2726821. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2731098. Maximum sequence length: 2049, sample length: 4201 [default0]:Skipping sample id=2749450. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2744461. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2756568. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2746692. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2755393. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2756461. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2751539. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2716181. Maximum sequence length: 2049, sample length: 3089 [default0]:Skipping sample id=2721250. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2712357. Maximum sequence length: 2049, sample length: 3236 [default0]:Skipping sample id=2716228. Maximum sequence length: 2049, sample length: 3039 [default0]:Skipping sample id=2747140. Maximum sequence length: 2049, sample length: 3160 [default0]:Skipping sample id=2720174. Maximum sequence length: 2049, sample length: 4589 [default0]:Skipping sample id=2717479. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2743088. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2715361. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2746858. Maximum sequence length: 2049, sample length: 2875 [default0]:Skipping sample id=2719544. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2715577. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2467429. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2477016. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2730428. Maximum sequence length: 2049, sample length: 5221 [default0]:Skipping sample id=2738194. Maximum sequence length: 2049, sample length: 4598 [default0]:Skipping sample id=2744777. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2712835. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2738483. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2749097. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2727390. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2737458. Maximum sequence length: 2049, sample length: 7102 [default0]:Skipping sample id=2742391. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2482773. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2756968. Maximum sequence length: 2049, sample length: 4320 [default0]:Skipping sample id=2717918. Maximum sequence length: 2049, sample length: 4085 [default0]:Skipping sample id=2744572. Maximum sequence length: 2049, sample length: 2991 [default0]:Skipping sample id=2741540. Maximum sequence length: 2049, sample length: 3101 [default0]:Skipping sample id=2494252. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2717304. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2751383. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2756420. Maximum sequence length: 2049, sample length: 6404 [default0]:Skipping sample id=2747222. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2749686. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2737644. Maximum sequence length: 2049, sample length: 3387 [default0]:Skipping sample id=2495200. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2714574. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2751584. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2753207. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2711805. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2747675. Maximum sequence length: 2049, sample length: 3754 [default0]:Skipping sample id=2727110. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2740162. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2742563. Maximum sequence length: 2049, sample length: 3496 [default0]:Skipping sample id=2738440. Maximum sequence length: 2049, sample length: 2602 [default0]:Skipping sample id=2734190. Maximum sequence length: 2049, sample length: 6800 [default0]:Skipping sample id=2734183. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2721637. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2719516. Maximum sequence length: 2049, sample length: 2903 [default0]:Skipping sample id=2711519. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2738800. Maximum sequence length: 2049, sample length: 6967 [default0]:Skipping sample id=2724139. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2493261. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2734576. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2712095. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2744948. Maximum sequence length: 2049, sample length: 3427 [default0]:Skipping sample id=2732675. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2718624. Maximum sequence length: 2049, sample length: 3759 [default0]:Skipping sample id=2734945. Maximum sequence length: 2049, sample length: 6455 [default0]:Skipping sample id=2735081. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2490064. Maximum sequence length: 2049, sample length: 3509 [default0]:Skipping sample id=2739088. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2495030. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2725896. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2740999. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2711524. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2723415. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2731323. Maximum sequence length: 2049, sample length: 4552 [default0]:Skipping sample id=2723915. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2470506. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2740564. Maximum sequence length: 2049, sample length: 3888 [default0]:Skipping sample id=2750788. Maximum sequence length: 2049, sample length: 3042 [default0]:Skipping sample id=2753494. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2495463. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2723793. Maximum sequence length: 2049, sample length: 4175 [default0]:Skipping sample id=2747681. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2745300. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2730631. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2752508. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2740156. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2747849. Maximum sequence length: 2049, sample length: 4696 [default0]:Skipping sample id=2722418. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2470795. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2728416. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2735948. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2470693. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2755969. Maximum sequence length: 2049, sample length: 3469 [default0]:Skipping sample id=2737093. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2731783. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2750837. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2753838. Maximum sequence length: 2049, sample length: 4023 [default0]:Skipping sample id=2741442. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2729291. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2733893. Maximum sequence length: 2049, sample length: 3990 [default0]:Skipping sample id=2717146. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2731467. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2716485. Maximum sequence length: 2049, sample length: 2861 [default0]:Skipping sample id=2733298. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2711789. Maximum sequence length: 2049, sample length: 3635 [default0]:Skipping sample id=2729642. Maximum sequence length: 2049, sample length: 4291 [default0]:Skipping sample id=2747451. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2717750. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2736793. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2729119. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2735435. Maximum sequence length: 2049, sample length: 3978 [default0]:Skipping sample id=2731514. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2717552. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2723830. Maximum sequence length: 2049, sample length: 3414 [default0]:Skipping sample id=2724009. Maximum sequence length: 2049, sample length: 5695 [default0]:Skipping sample id=2711512. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2715632. Maximum sequence length: 2049, sample length: 2993 [default0]:Skipping sample id=2754846. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2723339. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2738530. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2729488. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2469202. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2727213. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2721343. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2724146. Maximum sequence length: 2049, sample length: 2584 [default0]:Skipping sample id=2744712. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2755453. Maximum sequence length: 2049, sample length: 3368 [default0]:Skipping sample id=2724180. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2721933. Maximum sequence length: 2049, sample length: 2649 [default0]:Skipping sample id=2728828. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2725915. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2496510. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2716378. Maximum sequence length: 2049, sample length: 5201 [default0]:Skipping sample id=2732141. Maximum sequence length: 2049, sample length: 6050 [default0]:Skipping sample id=2749635. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2743982. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2756588. Maximum sequence length: 2049, sample length: 3347 [default0]:Skipping sample id=2737796. Maximum sequence length: 2049, sample length: 4952 [default0]:Skipping sample id=2755461. Maximum sequence length: 2049, sample length: 6245 [default0]:Skipping sample id=2727052. Maximum sequence length: 2049, sample length: 2944 [default0]:Skipping sample id=2730234. Maximum sequence length: 2049, sample length: 2578 [default0]:Skipping sample id=2471154. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2731359. Maximum sequence length: 2049, sample length: 2753 [default0]:Skipping sample id=2746214. Maximum sequence length: 2049, sample length: 3678 [default0]:Skipping sample id=2751577. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2494031. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2736218. Maximum sequence length: 2049, sample length: 3549 [default0]:Skipping sample id=2737095. Maximum sequence length: 2049, sample length: 3018 [default0]:Skipping sample id=2749729. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2750355. Maximum sequence length: 2049, sample length: 6425 [default0]:Skipping sample id=2756146. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2498864. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2479321. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2746184. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2731110. Maximum sequence length: 2049, sample length: 8151 [default0]:Skipping sample id=2468881. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2729550. Maximum sequence length: 2049, sample length: 2891 [default0]:Skipping sample id=2724630. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2736286. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2752434. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2469340. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2727673. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2494373. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2737143. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2748270. Maximum sequence length: 2049, sample length: 2905 [default0]:Skipping sample id=2466513. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2722200. Maximum sequence length: 2049, sample length: 3193 [default0]:Skipping sample id=2734322. Maximum sequence length: 2049, sample length: 3160 [default0]:Skipping sample id=2740551. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2739853. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2747895. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2712571. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2711592. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2749596. Maximum sequence length: 2049, sample length: 4806 [default0]:Skipping sample id=2735170. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2743601. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2716177. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2740168. Maximum sequence length: 2049, sample length: 2709 [default0]:Skipping sample id=2471289. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2716678. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2750983. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2727260. Maximum sequence length: 2049, sample length: 3010 [default0]:Skipping sample id=2498051. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2726160. Maximum sequence length: 2049, sample length: 3590 [default0]:Skipping sample id=2483099. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2749738. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2484432. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2734046. Maximum sequence length: 2049, sample length: 3068 [default0]:Skipping sample id=2739864. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2720788. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2724560. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2722655. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2484481. Maximum sequence length: 2049, sample length: 3307 [default0]:Skipping sample id=2736464. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2727470. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2733679. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2743208. Maximum sequence length: 2049, sample length: 3231 [default0]:Skipping sample id=2721591. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2745047. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2735593. Maximum sequence length: 2049, sample length: 7775 [default0]:Skipping sample id=2723881. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2490646. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2748127. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2746139. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2466612. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2728148. Maximum sequence length: 2049, sample length: 3687 [default0]:Skipping sample id=2733862. Maximum sequence length: 2049, sample length: 4351 [default0]:Skipping sample id=2714887. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2477251. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2740747. Maximum sequence length: 2049, sample length: 4123 [default0]:Skipping sample id=2712034. Maximum sequence length: 2049, sample length: 7271 [default0]:Skipping sample id=2488226. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2733466. Maximum sequence length: 2049, sample length: 5429 [default0]:Skipping sample id=2716701. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2719108. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2751081. Maximum sequence length: 2049, sample length: 5758 [default0]:Skipping sample id=2489685. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2735695. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2732249. Maximum sequence length: 2049, sample length: 4500 [default0]:Skipping sample id=2712314. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2739915. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2756110. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2720893. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2499377. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2730994. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2492318. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2481220. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2749358. Maximum sequence length: 2049, sample length: 5536 [default0]:Skipping sample id=2492363. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2477766. Maximum sequence length: 2049, sample length: 2829 [default0]:Skipping sample id=2737611. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2753738. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2728270. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2731980. Maximum sequence length: 2049, sample length: 2886 [default0]:Skipping sample id=2754221. Maximum sequence length: 2049, sample length: 6216 [default0]:Skipping sample id=2738368. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2723222. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2739037. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2718412. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2717984. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2496990. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2711560. Maximum sequence length: 2049, sample length: 4967 [default0]:Skipping sample id=2720293. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2755804. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2755597. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2729507. Maximum sequence length: 2049, sample length: 3325 [default0]:Skipping sample id=2748224. Maximum sequence length: 2049, sample length: 6264 [default0]:Skipping sample id=2711047. Maximum sequence length: 2049, sample length: 2849 [default0]:Skipping sample id=2732714. Maximum sequence length: 2049, sample length: 2924 [default0]:Skipping sample id=2713824. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2716301. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2483667. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2711787. Maximum sequence length: 2049, sample length: 6934 [default0]:Skipping sample id=2711861. Maximum sequence length: 2049, sample length: 3160 [default0]:Skipping sample id=2726894. Maximum sequence length: 2049, sample length: 4681 [default0]:Skipping sample id=2747319. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2728524. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2718653. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2744366. Maximum sequence length: 2049, sample length: 3145 [default0]:Skipping sample id=2735206. Maximum sequence length: 2049, sample length: 3858 [default0]:Skipping sample id=2713652. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2726610. Maximum sequence length: 2049, sample length: 2746 [default0]:Skipping sample id=2756609. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2727899. Maximum sequence length: 2049, sample length: 4376 [default0]:Skipping sample id=2722660. Maximum sequence length: 2049, sample length: 3067 [default0]:Skipping sample id=2735341. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2719648. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2748125. Maximum sequence length: 2049, sample length: 3115 [default0]:Skipping sample id=2716772. Maximum sequence length: 2049, sample length: 4168 [default0]:Skipping sample id=2733052. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2720141. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2749875. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2736107. Maximum sequence length: 2049, sample length: 4320 [default0]:Skipping sample id=2489360. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2730079. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2729107. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2750807. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2750088. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2498385. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2498943. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2750947. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2749739. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2480057. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2486693. Maximum sequence length: 2049, sample length: 4277 [default0]:Skipping sample id=2720419. Maximum sequence length: 2049, sample length: 2749 [default0]:Skipping sample id=2719077. Maximum sequence length: 2049, sample length: 3868 [default0]:Skipping sample id=2478164. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2484707. Maximum sequence length: 2049, sample length: 2835 [default0]:Skipping sample id=2737062. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2741870. Maximum sequence length: 2049, sample length: 3253 [default0]:Skipping sample id=2751645. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2494484. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2713021. Maximum sequence length: 2049, sample length: 6409 [default0]:Skipping sample id=2724832. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2752489. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2715360. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2735685. Maximum sequence length: 2049, sample length: 3611 [default0]:Skipping sample id=2750010. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2749688. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2752096. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2484157. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2752512. Maximum sequence length: 2049, sample length: 4083 [default0]:Skipping sample id=2739735. Maximum sequence length: 2049, sample length: 3374 [default0]:Skipping sample id=2711638. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2738974. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2481381. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2740106. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2754942. Maximum sequence length: 2049, sample length: 4005 [default0]:Skipping sample id=2715220. Maximum sequence length: 2049, sample length: 5964 [default0]:Skipping sample id=2481722. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2732966. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2726463. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2719665. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2743162. Maximum sequence length: 2049, sample length: 3939 [default0]:Skipping sample id=2753799. Maximum sequence length: 2049, sample length: 3966 [default0]:Skipping sample id=2751110. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2738313. Maximum sequence length: 2049, sample length: 3630 [default0]:Skipping sample id=2717394. Maximum sequence length: 2049, sample length: 3582 [default0]:Skipping sample id=2746680. Maximum sequence length: 2049, sample length: 3377 [default0]:Skipping sample id=2743674. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2725660. Maximum sequence length: 2049, sample length: 3826 [default0]:Skipping sample id=2713735. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2731591. Maximum sequence length: 2049, sample length: 4772 [default0]:Skipping sample id=2494104. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2749417. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2744927. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2743659. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2479566. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2732065. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2495951. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2736549. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2731215. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2467562. Maximum sequence length: 2049, sample length: 3237 [default0]:Skipping sample id=2756389. Maximum sequence length: 2049, sample length: 3208 [default0]:Skipping sample id=2715410. Maximum sequence length: 2049, sample length: 2968 [default0]:Skipping sample id=2748835. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2498565. Maximum sequence length: 2049, sample length: 2761 [default0]:Skipping sample id=2752397. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2748897. Maximum sequence length: 2049, sample length: 4594 [default0]:Skipping sample id=2714314. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2715791. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2756701. Maximum sequence length: 2049, sample length: 3145 [default0]:Skipping sample id=2748764. Maximum sequence length: 2049, sample length: 3026 [default0]:Skipping sample id=2749408. Maximum sequence length: 2049, sample length: 3149 [default0]:Skipping sample id=2717855. Maximum sequence length: 2049, sample length: 6228 [default0]:Skipping sample id=2755740. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2744509. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2755325. Maximum sequence length: 2049, sample length: 2816 [default0]:Skipping sample id=2742240. Maximum sequence length: 2049, sample length: 5523 [default0]:Skipping sample id=2748128. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2739574. Maximum sequence length: 2049, sample length: 2994 [default0]:Skipping sample id=2745981. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2745876. Maximum sequence length: 2049, sample length: 4124 [default0]:Skipping sample id=2743094. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2715428. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2716483. Maximum sequence length: 2049, sample length: 3798 [default0]:Skipping sample id=2731382. Maximum sequence length: 2049, sample length: 3263 [default0]:Skipping sample id=2736731. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2488799. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2751016. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2713161. Maximum sequence length: 2049, sample length: 3678 [default0]:Skipping sample id=2715135. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2725326. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2751392. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2738573. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2710994. Maximum sequence length: 2049, sample length: 3311 [default0]:Skipping sample id=2743844. Maximum sequence length: 2049, sample length: 3120 [default0]:Skipping sample id=2751996. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2714931. Maximum sequence length: 2049, sample length: 3436 [default0]:Skipping sample id=2712073. Maximum sequence length: 2049, sample length: 3209 [default0]:Skipping sample id=2729754. Maximum sequence length: 2049, sample length: 2704 [default0]:Skipping sample id=2737079. Maximum sequence length: 2049, sample length: 2640 [default0]:Skipping sample id=2742635. Maximum sequence length: 2049, sample length: 3024 [default0]:Skipping sample id=2748326. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2750497. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2484546. Maximum sequence length: 2049, sample length: 3411 [default0]:Skipping sample id=2746018. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2736124. Maximum sequence length: 2049, sample length: 3550 [default0]:Skipping sample id=2724823. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2497975. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2740537. Maximum sequence length: 2049, sample length: 3504 [default0]:Skipping sample id=2754038. Maximum sequence length: 2049, sample length: 4914 [default0]:Skipping sample id=2739558. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2497383. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2721111. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2725764. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2753085. Maximum sequence length: 2049, sample length: 3028 [default0]:Skipping sample id=2493998. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2713170. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2741937. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2752525. Maximum sequence length: 2049, sample length: 4131 [default0]:Skipping sample id=2735935. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2721518. Maximum sequence length: 2049, sample length: 2947 [default0]:Skipping sample id=2726026. Maximum sequence length: 2049, sample length: 2555 [default0]:Skipping sample id=2712581. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2725056. Maximum sequence length: 2049, sample length: 5242 [default0]:Skipping sample id=2711148. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2750841. Maximum sequence length: 2049, sample length: 2678 [default0]:Skipping sample id=2482837. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2713523. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2721322. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2748949. Maximum sequence length: 2049, sample length: 3953 [default0]:Skipping sample id=2740858. Maximum sequence length: 2049, sample length: 3570 [default0]:Skipping sample id=2741780. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2727607. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2731431. Maximum sequence length: 2049, sample length: 3563 [default0]:Skipping sample id=2467951. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2729338. Maximum sequence length: 2049, sample length: 4579 [default0]:Skipping sample id=2715035. Maximum sequence length: 2049, sample length: 3239 [default0]:Skipping sample id=2719976. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2738787. Maximum sequence length: 2049, sample length: 3292 [default0]:Skipping sample id=2737848. Maximum sequence length: 2049, sample length: 5801 [default0]:Skipping sample id=2738556. Maximum sequence length: 2049, sample length: 5615 [default0]:Skipping sample id=2485266. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2743945. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2721182. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2731407. Maximum sequence length: 2049, sample length: 4372 [default0]:Skipping sample id=2734110. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2736267. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2741722. Maximum sequence length: 2049, sample length: 2565 [default0]:Skipping sample id=2726278. Maximum sequence length: 2049, sample length: 4389 [default0]:Skipping sample id=2733335. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2491184. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2757116. Maximum sequence length: 2049, sample length: 2881 [default0]:Skipping sample id=2746754. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2753308. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2742114. Maximum sequence length: 2049, sample length: 6489 [default0]:Skipping sample id=2488409. Maximum sequence length: 2049, sample length: 2515 [default0]:Skipping sample id=2728752. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2723701. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2716448. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2732713. Maximum sequence length: 2049, sample length: 4145 [default0]:Skipping sample id=2725751. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2716199. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2722469. Maximum sequence length: 2049, sample length: 5353 [default0]:Skipping sample id=2482253. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2486350. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2756783. Maximum sequence length: 2049, sample length: 3795 [default0]:Skipping sample id=2717819. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2489704. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2711772. Maximum sequence length: 2049, sample length: 3844 [default0]:Skipping sample id=2736190. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2727440. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2719107. Maximum sequence length: 2049, sample length: 5423 [default0]:Skipping sample id=2738788. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2722022. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2733156. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2729398. Maximum sequence length: 2049, sample length: 4360 [default0]:Skipping sample id=2749443. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2731534. Maximum sequence length: 2049, sample length: 5003 [default0]:Skipping sample id=2717022. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2725509. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2729049. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2492283. Maximum sequence length: 2049, sample length: 3108 [default0]:Skipping sample id=2749807. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2728550. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2753684. Maximum sequence length: 2049, sample length: 4663 [default0]:Skipping sample id=2483263. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2747546. Maximum sequence length: 2049, sample length: 3976 [default0]:Skipping sample id=2719628. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2734724. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2489969. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2714585. Maximum sequence length: 2049, sample length: 2999 [default0]:Skipping sample id=2741234. Maximum sequence length: 2049, sample length: 4524 [default0]:Skipping sample id=2738948. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2739147. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2752282. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2742622. Maximum sequence length: 2049, sample length: 4221 [default0]:Skipping sample id=2744507. Maximum sequence length: 2049, sample length: 3330 [default0]:Skipping sample id=2711167. Maximum sequence length: 2049, sample length: 2406 [default0]:Skipping sample id=2721484. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2719625. Maximum sequence length: 2049, sample length: 3378 [default0]:Skipping sample id=2756706. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2716407. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2746313. Maximum sequence length: 2049, sample length: 3642 [default0]:Skipping sample id=2721678. Maximum sequence length: 2049, sample length: 3423 [default0]:Skipping sample id=2716633. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2715563. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2719311. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2718121. Maximum sequence length: 2049, sample length: 5381 [default0]:Skipping sample id=2729977. Maximum sequence length: 2049, sample length: 3440 [default0]:Skipping sample id=2726935. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2492245. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2746411. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2715541. Maximum sequence length: 2049, sample length: 3318 [default0]:Skipping sample id=2748808. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2734280. Maximum sequence length: 2049, sample length: 5756 [default0]:Skipping sample id=2712384. Maximum sequence length: 2049, sample length: 3765 [default0]:Skipping sample id=2741622. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2481047. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2733300. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2740889. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2734087. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2751412. Maximum sequence length: 2049, sample length: 5054 [default0]:Skipping sample id=2720978. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2724066. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2715987. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2711064. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2715387. Maximum sequence length: 2049, sample length: 3448 [default0]:Skipping sample id=2738046. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2743517. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2493282. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2725134. Maximum sequence length: 2049, sample length: 4329 [default0]:Skipping sample id=2744499. Maximum sequence length: 2049, sample length: 3639 [default0]:Skipping sample id=2752179. Maximum sequence length: 2049, sample length: 3833 [default0]:Skipping sample id=2745887. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2724438. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2719809. Maximum sequence length: 2049, sample length: 4005 [default0]:Skipping sample id=2717391. Maximum sequence length: 2049, sample length: 4261 [default0]:Skipping sample id=2726186. Maximum sequence length: 2049, sample length: 4574 [default0]:Skipping sample id=2737191. Maximum sequence length: 2049, sample length: 3364 [default0]:Skipping sample id=2745762. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2483651. Maximum sequence length: 2049, sample length: 4093 [default0]:Skipping sample id=2711043. Maximum sequence length: 2049, sample length: 4245 [default0]:Skipping sample id=2723153. Maximum sequence length: 2049, sample length: 3067 [default0]:Skipping sample id=2488133. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2716078. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2741033. Maximum sequence length: 2049, sample length: 6399 [default0]:Skipping sample id=2736460. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2755506. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2730332. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2751660. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2752714. Maximum sequence length: 2049, sample length: 3470 [default0]:Skipping sample id=2733080. Maximum sequence length: 2049, sample length: 3164 [default0]:Skipping sample id=2711798. Maximum sequence length: 2049, sample length: 2836 [default0]:Skipping sample id=2743519. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2732404. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2491349. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2731734. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2733283. Maximum sequence length: 2049, sample length: 4148 [default0]:Skipping sample id=2725937. Maximum sequence length: 2049, sample length: 3063 [default0]:Skipping sample id=2483323. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2748223. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2724694. Maximum sequence length: 2049, sample length: 4726 [default0]:Skipping sample id=2494242. Maximum sequence length: 2049, sample length: 3164 [default0]:Skipping sample id=2722685. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2748569. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2496788. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2716862. Maximum sequence length: 2049, sample length: 4510 [default0]:Skipping sample id=2492950. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2716790. Maximum sequence length: 2049, sample length: 3278 [default0]:Skipping sample id=2744652. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2739967. Maximum sequence length: 2049, sample length: 3011 [default0]:Skipping sample id=2746104. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2731771. Maximum sequence length: 2049, sample length: 5076 [default0]:Skipping sample id=2718323. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2750525. Maximum sequence length: 2049, sample length: 4404 [default0]:Skipping sample id=2755468. Maximum sequence length: 2049, sample length: 4560 [default0]:Skipping sample id=2743510. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2711775. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2731944. Maximum sequence length: 2049, sample length: 3279 [default0]:Skipping sample id=2725558. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2748541. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2713837. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2712767. Maximum sequence length: 2049, sample length: 4158 [default0]:Skipping sample id=2491909. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2739783. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2753917. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2745513. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2717872. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2728430. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2718659. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2743038. Maximum sequence length: 2049, sample length: 3680 [default0]:Skipping sample id=2743158. Maximum sequence length: 2049, sample length: 3634 [default0]:Skipping sample id=2723170. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2731996. Maximum sequence length: 2049, sample length: 4249 [default0]:Skipping sample id=2731351. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2470034. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2754175. Maximum sequence length: 2049, sample length: 2781 [default0]:Skipping sample id=2742546. Maximum sequence length: 2049, sample length: 3747 [default0]:Skipping sample id=2730710. Maximum sequence length: 2049, sample length: 5665 [default0]:Skipping sample id=2487558. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2755170. Maximum sequence length: 2049, sample length: 2585 [default0]:Skipping sample id=2477115. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2754010. Maximum sequence length: 2049, sample length: 3808 [default0]:Skipping sample id=2713006. Maximum sequence length: 2049, sample length: 5042 [default0]:Skipping sample id=2750353. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2478500. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2730607. Maximum sequence length: 2049, sample length: 5954 [default0]:Skipping sample id=2740514. Maximum sequence length: 2049, sample length: 3477 [default0]:Skipping sample id=2729861. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2756378. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2750399. Maximum sequence length: 2049, sample length: 3275 [default0]:Skipping sample id=2726387. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2497868. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2739562. Maximum sequence length: 2049, sample length: 3892 [default0]:Skipping sample id=2721342. Maximum sequence length: 2049, sample length: 4076 [default0]:Skipping sample id=2727951. Maximum sequence length: 2049, sample length: 3164 [default0]:Skipping sample id=2731594. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2729466. Maximum sequence length: 2049, sample length: 3654 [default0]:Skipping sample id=2727826. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2492523. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2723782. Maximum sequence length: 2049, sample length: 2947 [default0]:Skipping sample id=2488653. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2720531. Maximum sequence length: 2049, sample length: 3819 [default0]:Skipping sample id=2713779. Maximum sequence length: 2049, sample length: 5121 [default0]:Skipping sample id=2719132. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2733667. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2749785. Maximum sequence length: 2049, sample length: 4566 [default0]:Skipping sample id=2486219. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2722468. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2741607. Maximum sequence length: 2049, sample length: 3156 [default0]:Skipping sample id=2712462. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2712329. Maximum sequence length: 2049, sample length: 2574 [default0]:Skipping sample id=2714113. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2725853. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2732640. Maximum sequence length: 2049, sample length: 2728 [default0]:Skipping sample id=2746976. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2736359. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2729952. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2711538. Maximum sequence length: 2049, sample length: 3643 [default0]:Skipping sample id=2720546. Maximum sequence length: 2049, sample length: 4806 [default0]:Skipping sample id=2737283. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2731825. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2723495. Maximum sequence length: 2049, sample length: 2580 [default0]:Skipping sample id=2734414. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2738121. Maximum sequence length: 2049, sample length: 4853 [default0]:Skipping sample id=2755430. Maximum sequence length: 2049, sample length: 3817 [default0]:Skipping sample id=2754248. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2751482. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2727562. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2498904. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2747706. Maximum sequence length: 2049, sample length: 3082 [default0]:Skipping sample id=2723816. Maximum sequence length: 2049, sample length: 2961 [default0]:Skipping sample id=2738434. Maximum sequence length: 2049, sample length: 3935 [default0]:Skipping sample id=2488019. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2731841. Maximum sequence length: 2049, sample length: 4428 [default0]:Skipping sample id=2746525. Maximum sequence length: 2049, sample length: 3790 [default0]:Skipping sample id=2725601. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2483764. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2748000. Maximum sequence length: 2049, sample length: 3939 [default0]:Skipping sample id=2719614. Maximum sequence length: 2049, sample length: 4424 [default0]:Skipping sample id=2747244. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2724830. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2720392. Maximum sequence length: 2049, sample length: 3639 [default0]:Skipping sample id=2749642. Maximum sequence length: 2049, sample length: 4002 [default0]:Skipping sample id=2728720. Maximum sequence length: 2049, sample length: 3971 [default0]:Skipping sample id=2734587. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2467563. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2715477. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2713789. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2742888. Maximum sequence length: 2049, sample length: 5316 [default0]:Skipping sample id=2748694. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2481959. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2755663. Maximum sequence length: 2049, sample length: 4044 [default0]:Skipping sample id=2730923. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2728640. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2756958. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2735909. Maximum sequence length: 2049, sample length: 2824 [default0]:Skipping sample id=2724172. Maximum sequence length: 2049, sample length: 2585 [default0]:Skipping sample id=2748973. Maximum sequence length: 2049, sample length: 3127 [default0]:Skipping sample id=2734236. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2713351. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2718857. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2745588. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2486052. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2744837. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2737507. Maximum sequence length: 2049, sample length: 3517 [default0]:Skipping sample id=2713903. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2748115. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2746198. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2747463. Maximum sequence length: 2049, sample length: 4844 [default0]:Skipping sample id=2742692. Maximum sequence length: 2049, sample length: 3612 [default0]:Skipping sample id=2742840. Maximum sequence length: 2049, sample length: 6256 [default0]:Skipping sample id=2742637. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2754963. Maximum sequence length: 2049, sample length: 3470 [default0]:Skipping sample id=2725747. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2726204. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2748629. Maximum sequence length: 2049, sample length: 6108 [default0]:Skipping sample id=2731922. Maximum sequence length: 2049, sample length: 4042 [default0]:Skipping sample id=2717887. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2747398. Maximum sequence length: 2049, sample length: 3912 [default0]:Skipping sample id=2739312. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2722433. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2740183. Maximum sequence length: 2049, sample length: 4592 [default0]:Skipping sample id=2731632. Maximum sequence length: 2049, sample length: 3977 [default0]:Skipping sample id=2484204. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2485346. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2736640. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2493563. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2741941. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2730814. Maximum sequence length: 2049, sample length: 4438 [default0]:Skipping sample id=2724289. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2748950. Maximum sequence length: 2049, sample length: 4113 [default0]:Skipping sample id=2748470. Maximum sequence length: 2049, sample length: 3333 [default0]:Skipping sample id=2749852. Maximum sequence length: 2049, sample length: 5820 [default0]:Skipping sample id=2717277. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2494561. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2496585. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2745204. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2733576. Maximum sequence length: 2049, sample length: 3822 [default0]:Skipping sample id=2754725. Maximum sequence length: 2049, sample length: 4805 [default0]:Skipping sample id=2740297. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2725323. Maximum sequence length: 2049, sample length: 3976 [default0]:Skipping sample id=2753475. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2749982. Maximum sequence length: 2049, sample length: 3414 [default0]:Skipping sample id=2722427. Maximum sequence length: 2049, sample length: 3475 [default0]:Skipping sample id=2466019. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2728910. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2749484. Maximum sequence length: 2049, sample length: 4140 [default0]:Skipping sample id=2751619. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2755024. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2749474. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2742107. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2724038. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2742803. Maximum sequence length: 2049, sample length: 3327 [default0]:Skipping sample id=2751222. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2487875. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2747055. Maximum sequence length: 2049, sample length: 3107 [default0]:Skipping sample id=2755424. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2491606. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2711983. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2740121. Maximum sequence length: 2049, sample length: 3630 [default0]:Skipping sample id=2471032. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2724552. Maximum sequence length: 2049, sample length: 4336 [default0]:Skipping sample id=2733402. Maximum sequence length: 2049, sample length: 3736 [default0]:Skipping sample id=2723279. Maximum sequence length: 2049, sample length: 6416 [default0]:Skipping sample id=2743806. Maximum sequence length: 2049, sample length: 3432 [default0]:Skipping sample id=2756755. Maximum sequence length: 2049, sample length: 8038 [default0]:Skipping sample id=2741821. Maximum sequence length: 2049, sample length: 2747 [default0]:Skipping sample id=2732147. Maximum sequence length: 2049, sample length: 4178 [default0]:Skipping sample id=2747790. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2711435. Maximum sequence length: 2049, sample length: 3824 [default0]:Skipping sample id=2711145. Maximum sequence length: 2049, sample length: 5202 [default0]:Skipping sample id=2755414. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2722759. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2747751. Maximum sequence length: 2049, sample length: 4228 [default0]:Skipping sample id=2729985. Maximum sequence length: 2049, sample length: 2857 [default0]:Skipping sample id=2754907. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2738925. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2728035. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2753064. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2731331. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2494605. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2738234. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2715330. Maximum sequence length: 2049, sample length: 2585 [default0]:Skipping sample id=2720826. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2482929. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2728247. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2748300. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2741271. Maximum sequence length: 2049, sample length: 3387 [default0]:Skipping sample id=2718147. Maximum sequence length: 2049, sample length: 2925 [default0]:Skipping sample id=2466552. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2483031. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2713195. Maximum sequence length: 2049, sample length: 4628 [default0]:Skipping sample id=2729907. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2476982. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2731421. Maximum sequence length: 2049, sample length: 2705 [default0]:Skipping sample id=2744358. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2723382. Maximum sequence length: 2049, sample length: 4378 [default0]:Skipping sample id=2726807. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2477379. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2490343. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2733834. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2721074. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2745643. Maximum sequence length: 2049, sample length: 3478 [default0]:Skipping sample id=2715352. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2714425. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2717295. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2716189. Maximum sequence length: 2049, sample length: 4823 [default0]:Skipping sample id=2734946. Maximum sequence length: 2049, sample length: 2734 [default0]:Skipping sample id=2755136. Maximum sequence length: 2049, sample length: 4472 [default0]:Skipping sample id=2724575. Maximum sequence length: 2049, sample length: 2369 [default0]:Skipping sample id=2485017. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2732133. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2728120. Maximum sequence length: 2049, sample length: 3175 [default0]:Skipping sample id=2716031. Maximum sequence length: 2049, sample length: 4268 [default0]:Skipping sample id=2747401. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2732721. Maximum sequence length: 2049, sample length: 4739 [default0]:Skipping sample id=2734398. Maximum sequence length: 2049, sample length: 4660 [default0]:Skipping sample id=2754093. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2750795. Maximum sequence length: 2049, sample length: 3511 [default0]:Skipping sample id=2728700. Maximum sequence length: 2049, sample length: 3722 [default0]:Skipping sample id=2747197. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2745455. Maximum sequence length: 2049, sample length: 3575 [default0]:Skipping sample id=2727351. Maximum sequence length: 2049, sample length: 4150 [default0]:Skipping sample id=2735479. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2731668. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2487965. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2494866. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2735047. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2743210. Maximum sequence length: 2049, sample length: 4321 [default0]:Skipping sample id=2725610. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2735294. Maximum sequence length: 2049, sample length: 3244 [default0]:Skipping sample id=2726194. Maximum sequence length: 2049, sample length: 4569 [default0]:Skipping sample id=2467620. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2732617. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2467261. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2719816. Maximum sequence length: 2049, sample length: 3250 [default0]:Skipping sample id=2720478. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2726843. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2742243. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2732275. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2719932. Maximum sequence length: 2049, sample length: 2703 [default0]:Skipping sample id=2733105. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2753918. Maximum sequence length: 2049, sample length: 3514 [default0]:Skipping sample id=2747309. Maximum sequence length: 2049, sample length: 3970 [default0]:Skipping sample id=2730227. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2744629. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2714710. Maximum sequence length: 2049, sample length: 5522 [default0]:Skipping sample id=2711742. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2719717. Maximum sequence length: 2049, sample length: 3342 [default0]:Skipping sample id=2736775. Maximum sequence length: 2049, sample length: 2854 [default0]:Skipping sample id=2756058. Maximum sequence length: 2049, sample length: 3435 [default0]:Skipping sample id=2726514. Maximum sequence length: 2049, sample length: 3341 [default0]:Skipping sample id=2723189. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2733113. Maximum sequence length: 2049, sample length: 3218 [default0]:Skipping sample id=2711955. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2736931. Maximum sequence length: 2049, sample length: 4356 [default0]:Skipping sample id=2745263. Maximum sequence length: 2049, sample length: 7506 [default0]:Skipping sample id=2485044. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2482226. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2753100. Maximum sequence length: 2049, sample length: 3932 [default0]:Skipping sample id=2754315. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2727821. Maximum sequence length: 2049, sample length: 3280 [default0]:Skipping sample id=2716405. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2720522. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2495304. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2722994. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2725925. Maximum sequence length: 2049, sample length: 4235 [default0]:Skipping sample id=2486819. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2477738. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2753040. Maximum sequence length: 2049, sample length: 4443 [default0]:Skipping sample id=2719925. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2746417. Maximum sequence length: 2049, sample length: 3263 [default0]:Skipping sample id=2754590. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2743602. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2497867. Maximum sequence length: 2049, sample length: 3108 [default0]:Skipping sample id=2714081. Maximum sequence length: 2049, sample length: 3259 [default0]:Skipping sample id=2741730. Maximum sequence length: 2049, sample length: 3100 [default0]:Skipping sample id=2711916. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2724620. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2725652. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2726445. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2737679. Maximum sequence length: 2049, sample length: 7067 [default0]:Skipping sample id=2478441. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2723525. Maximum sequence length: 2049, sample length: 4967 [default0]:Skipping sample id=2469599. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2750074. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2730539. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2470799. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2484572. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2496116. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2749020. Maximum sequence length: 2049, sample length: 6442 [default0]:Skipping sample id=2738351. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2727790. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2714121. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2722056. Maximum sequence length: 2049, sample length: 2843 [default0]:Skipping sample id=2719491. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2467357. Maximum sequence length: 2049, sample length: 3620 [default0]:Skipping sample id=2482334. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2720285. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2739111. Maximum sequence length: 2049, sample length: 3601 [default0]:Skipping sample id=2755658. Maximum sequence length: 2049, sample length: 3000 [default0]:Skipping sample id=2734510. Maximum sequence length: 2049, sample length: 8496 [default0]:Skipping sample id=2725253. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2732845. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2751059. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2754335. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2739412. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2719962. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2715540. Maximum sequence length: 2049, sample length: 4072 [default0]:Skipping sample id=2486197. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2732308. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2716389. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2713672. Maximum sequence length: 2049, sample length: 2545 [default0]:Skipping sample id=2714112. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2721058. Maximum sequence length: 2049, sample length: 4988 [default0]:Skipping sample id=2753590. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2745541. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2724287. Maximum sequence length: 2049, sample length: 3311 [default0]:Skipping sample id=2754582. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2721143. Maximum sequence length: 2049, sample length: 3946 [default0]:Skipping sample id=2734068. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2723282. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2736434. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2713902. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2722199. Maximum sequence length: 2049, sample length: 5327 [default0]:Skipping sample id=2754703. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2728501. Maximum sequence length: 2049, sample length: 2796 [default0]:Skipping sample id=2721044. Maximum sequence length: 2049, sample length: 4073 [default0]:Skipping sample id=2738183. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2735199. Maximum sequence length: 2049, sample length: 4199 [default0]:Skipping sample id=2725797. Maximum sequence length: 2049, sample length: 3766 [default0]:Skipping sample id=2737157. Maximum sequence length: 2049, sample length: 3029 [default0]:Skipping sample id=2489318. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2745386. Maximum sequence length: 2049, sample length: 3296 [default0]:Skipping sample id=2713939. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2722872. Maximum sequence length: 2049, sample length: 3015 [default0]:Skipping sample id=2738308. Maximum sequence length: 2049, sample length: 3696 [default0]:Skipping sample id=2742662. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2724794. Maximum sequence length: 2049, sample length: 3561 [default0]:Skipping sample id=2716961. Maximum sequence length: 2049, sample length: 5644 [default0]:Skipping sample id=2721133. Maximum sequence length: 2049, sample length: 4214 [default0]:Skipping sample id=2718618. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2754285. Maximum sequence length: 2049, sample length: 2853 [default0]:Skipping sample id=2755054. Maximum sequence length: 2049, sample length: 3459 [default0]:Skipping sample id=2725677. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2756003. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2749244. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2743535. Maximum sequence length: 2049, sample length: 3569 [default0]:Skipping sample id=2716913. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2729525. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2489875. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2729327. Maximum sequence length: 2049, sample length: 2573 [default0]:Skipping sample id=2721214. Maximum sequence length: 2049, sample length: 3201 [default0]:Skipping sample id=2748051. Maximum sequence length: 2049, sample length: 2766 [default0]:Skipping sample id=2489570. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2744582. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2732883. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2487329. Maximum sequence length: 2049, sample length: 3118 [default0]:Skipping sample id=2729785. Maximum sequence length: 2049, sample length: 3947 [default0]:Skipping sample id=2714847. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2726432. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2756915. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2742985. Maximum sequence length: 2049, sample length: 5171 [default0]:Skipping sample id=2494421. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2496382. Maximum sequence length: 2049, sample length: 2585 [default0]:Skipping sample id=2745866. Maximum sequence length: 2049, sample length: 6141 [default0]:Skipping sample id=2723420. Maximum sequence length: 2049, sample length: 2683 [default0]:Skipping sample id=2495366. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2716721. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2750534. Maximum sequence length: 2049, sample length: 4045 [default0]:Skipping sample id=2479395. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2723136. Maximum sequence length: 2049, sample length: 3110 [default0]:Skipping sample id=2711854. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2482244. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2713448. Maximum sequence length: 2049, sample length: 3369 [default0]:Skipping sample id=2741065. Maximum sequence length: 2049, sample length: 4497 [default0]:Skipping sample id=2722942. Maximum sequence length: 2049, sample length: 2771 [default0]:Skipping sample id=2745024. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2747534. Maximum sequence length: 2049, sample length: 4136 [default0]:Skipping sample id=2734031. Maximum sequence length: 2049, sample length: 5248 [default0]:Skipping sample id=2721173. Maximum sequence length: 2049, sample length: 3573 [default0]:Skipping sample id=2727892. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2742490. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2751872. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2751218. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2746823. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2713909. Maximum sequence length: 2049, sample length: 3612 [default0]:Skipping sample id=2747908. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2713024. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2750912. Maximum sequence length: 2049, sample length: 3083 [default0]:Skipping sample id=2720053. Maximum sequence length: 2049, sample length: 4597 [default0]:Skipping sample id=2722347. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2733559. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2721991. Maximum sequence length: 2049, sample length: 4517 [default0]:Skipping sample id=2740429. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2722818. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2492864. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2753455. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2715509. Maximum sequence length: 2049, sample length: 5163 [default0]:Skipping sample id=2489791. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2753033. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2727729. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2755274. Maximum sequence length: 2049, sample length: 2914 [default0]:Skipping sample id=2739897. Maximum sequence length: 2049, sample length: 4425 [default0]:Skipping sample id=2748761. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2734616. Maximum sequence length: 2049, sample length: 3877 [default0]:Skipping sample id=2738273. Maximum sequence length: 2049, sample length: 5153 [default0]:Skipping sample id=2718197. Maximum sequence length: 2049, sample length: 3990 [default0]:Skipping sample id=2747064. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2718710. Maximum sequence length: 2049, sample length: 4242 [default0]:Skipping sample id=2756393. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2716276. Maximum sequence length: 2049, sample length: 3931 [default0]:Skipping sample id=2725266. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2729705. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2740984. Maximum sequence length: 2049, sample length: 2960 [default0]:Skipping sample id=2738564. Maximum sequence length: 2049, sample length: 3641 [default0]:Skipping sample id=2720199. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2721588. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2726440. Maximum sequence length: 2049, sample length: 3768 [default0]:Skipping sample id=2718754. Maximum sequence length: 2049, sample length: 3905 [default0]:Skipping sample id=2732924. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2485398. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2712912. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2739520. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2724797. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2468589. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2742010. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2733633. Maximum sequence length: 2049, sample length: 6533 [default0]:Skipping sample id=2486652. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2717710. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2747002. Maximum sequence length: 2049, sample length: 2906 [default0]:Skipping sample id=2719466. Maximum sequence length: 2049, sample length: 5554 [default0]:Skipping sample id=2748147. Maximum sequence length: 2049, sample length: 2836 [default0]:Skipping sample id=2730097. Maximum sequence length: 2049, sample length: 5512 [default0]:Skipping sample id=2724277. Maximum sequence length: 2049, sample length: 3977 [default0]:Skipping sample id=2740758. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2483388. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2713803. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2497708. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2756384. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2466906. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2727672. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2735197. Maximum sequence length: 2049, sample length: 3686 [default0]:Skipping sample id=2727453. Maximum sequence length: 2049, sample length: 6428 [default0]:Skipping sample id=2469332. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2718599. Maximum sequence length: 2049, sample length: 6151 [default0]:Skipping sample id=2727147. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2720188. Maximum sequence length: 2049, sample length: 5087 [default0]:Skipping sample id=2757064. Maximum sequence length: 2049, sample length: 3912 [default0]:Skipping sample id=2738064. Maximum sequence length: 2049, sample length: 3558 [default0]:Skipping sample id=2717163. Maximum sequence length: 2049, sample length: 7271 [default0]:Skipping sample id=2721010. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2488623. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2717874. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2480101. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2727401. Maximum sequence length: 2049, sample length: 3437 [default0]:Skipping sample id=2751988. Maximum sequence length: 2049, sample length: 3107 [default0]:Skipping sample id=2751295. Maximum sequence length: 2049, sample length: 3378 [default0]:Skipping sample id=2731376. Maximum sequence length: 2049, sample length: 2881 [default0]:Skipping sample id=2489502. Maximum sequence length: 2049, sample length: 3343 [default0]:Skipping sample id=2753829. Maximum sequence length: 2049, sample length: 4773 [default0]:Skipping sample id=2484390. Maximum sequence length: 2049, sample length: 2672 [default0]:Skipping sample id=2714614. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2489509. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2729390. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2720057. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2718541. Maximum sequence length: 2049, sample length: 3026 [default0]:Skipping sample id=2726296. Maximum sequence length: 2049, sample length: 3626 [default0]:Skipping sample id=2721162. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2750337. Maximum sequence length: 2049, sample length: 2584 [default0]:Skipping sample id=2749461. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2717724. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2466264. Maximum sequence length: 2049, sample length: 2803 [default0]:Skipping sample id=2719451. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2741124. Maximum sequence length: 2049, sample length: 3411 [default0]:Skipping sample id=2735760. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2751106. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2739370. Maximum sequence length: 2049, sample length: 3638 [default0]:Skipping sample id=2714612. Maximum sequence length: 2049, sample length: 3400 [default0]:Skipping sample id=2498485. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2713769. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2756823. Maximum sequence length: 2049, sample length: 3130 [default0]:Skipping sample id=2733801. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2747407. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2490306. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2743724. Maximum sequence length: 2049, sample length: 5271 [default0]:Skipping sample id=2737864. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2735278. Maximum sequence length: 2049, sample length: 2906 [default0]:Skipping sample id=2722478. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2721265. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2738691. Maximum sequence length: 2049, sample length: 3945 [default0]:Skipping sample id=2746698. Maximum sequence length: 2049, sample length: 3096 [default0]:Skipping sample id=2752837. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2717967. Maximum sequence length: 2049, sample length: 3484 [default0]:Skipping sample id=2721476. Maximum sequence length: 2049, sample length: 2960 [default0]:Skipping sample id=2738826. Maximum sequence length: 2049, sample length: 4250 [default0]:Skipping sample id=2731438. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2721158. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2711126. Maximum sequence length: 2049, sample length: 5938 [default0]:Skipping sample id=2747935. Maximum sequence length: 2049, sample length: 2521 [default0]:Skipping sample id=2722212. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2737698. Maximum sequence length: 2049, sample length: 3022 [default0]:Skipping sample id=2745210. Maximum sequence length: 2049, sample length: 4554 [default0]:Skipping sample id=2719208. Maximum sequence length: 2049, sample length: 3664 [default0]:Skipping sample id=2712506. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2752269. Maximum sequence length: 2049, sample length: 3158 [default0]:Skipping sample id=2721167. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2751553. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2722735. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2723642. Maximum sequence length: 2049, sample length: 4480 [default0]:Skipping sample id=2715175. Maximum sequence length: 2049, sample length: 3060 [default0]:Skipping sample id=2744054. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2713546. Maximum sequence length: 2049, sample length: 3037 [default0]:Skipping sample id=2741774. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2714392. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2499309. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2724839. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2725574. Maximum sequence length: 2049, sample length: 4189 [default0]:Skipping sample id=2715662. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2723606. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2723938. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2723008. Maximum sequence length: 2049, sample length: 3180 [default0]:Skipping sample id=2727708. Maximum sequence length: 2049, sample length: 6482 [default0]:Skipping sample id=2727229. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2488769. Maximum sequence length: 2049, sample length: 3027 [default0]:Skipping sample id=2722106. Maximum sequence length: 2049, sample length: 4388 [default0]:Skipping sample id=2752950. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2740585. Maximum sequence length: 2049, sample length: 4331 [default0]:Skipping sample id=2481584. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2739377. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2498263. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2756946. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2741648. Maximum sequence length: 2049, sample length: 3149 [default0]:Skipping sample id=2481388. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2738043. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2745413. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2733454. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2750026. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2731586. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2747373. Maximum sequence length: 2049, sample length: 4072 [default0]:Skipping sample id=2721465. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2721403. Maximum sequence length: 2049, sample length: 5209 [default0]:Skipping sample id=2735881. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2739806. Maximum sequence length: 2049, sample length: 2703 [default0]:Skipping sample id=2743860. Maximum sequence length: 2049, sample length: 3532 [default0]:Skipping sample id=2741236. Maximum sequence length: 2049, sample length: 4167 [default0]:Skipping sample id=2734433. Maximum sequence length: 2049, sample length: 3989 [default0]:Skipping sample id=2734375. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2744959. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2735956. Maximum sequence length: 2049, sample length: 3742 [default0]:Skipping sample id=2716123. Maximum sequence length: 2049, sample length: 3311 [default0]:Skipping sample id=2721463. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2757063. Maximum sequence length: 2049, sample length: 4139 [default0]:Skipping sample id=2725793. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2719241. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2723377. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2717443. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2722747. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2718067. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2737347. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2724941. Maximum sequence length: 2049, sample length: 4216 [default0]:Skipping sample id=2718410. Maximum sequence length: 2049, sample length: 3913 [default0]:Skipping sample id=2717308. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2743254. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2715717. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2734768. Maximum sequence length: 2049, sample length: 4527 [default0]:Skipping sample id=2717203. Maximum sequence length: 2049, sample length: 3750 [default0]:Skipping sample id=2711817. Maximum sequence length: 2049, sample length: 3909 [default0]:Skipping sample id=2726676. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2467616. Maximum sequence length: 2049, sample length: 3306 [default0]:Skipping sample id=2737606. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2717813. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2494632. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2480002. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2482860. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2717143. Maximum sequence length: 2049, sample length: 3447 [default0]:Skipping sample id=2734878. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2755724. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2477238. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2731958. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2718470. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2738510. Maximum sequence length: 2049, sample length: 2994 [default0]:Skipping sample id=2714284. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2748095. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2491284. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2741696. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2752876. Maximum sequence length: 2049, sample length: 2601 [default0]:Skipping sample id=2733494. Maximum sequence length: 2049, sample length: 3904 [default0]:Skipping sample id=2754941. Maximum sequence length: 2049, sample length: 3461 [default0]:Skipping sample id=2714675. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2752496. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2754232. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2730185. Maximum sequence length: 2049, sample length: 5225 [default0]:Skipping sample id=2753743. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2744166. Maximum sequence length: 2049, sample length: 3195 [default0]:Skipping sample id=2719678. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2751931. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2756415. Maximum sequence length: 2049, sample length: 3414 [default0]:Skipping sample id=2745072. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2710983. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2736530. Maximum sequence length: 2049, sample length: 3936 [default0]:Skipping sample id=2752977. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2740335. Maximum sequence length: 2049, sample length: 3732 [default0]:Skipping sample id=2483111. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2754704. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2745034. Maximum sequence length: 2049, sample length: 3408 [default0]:Skipping sample id=2739010. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2728125. Maximum sequence length: 2049, sample length: 3063 [default0]:Skipping sample id=2729786. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2719994. Maximum sequence length: 2049, sample length: 3880 [default0]:Skipping sample id=2728441. Maximum sequence length: 2049, sample length: 3288 [default0]:Skipping sample id=2735202. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2732978. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2755287. Maximum sequence length: 2049, sample length: 4094 [default0]:Skipping sample id=2730943. Maximum sequence length: 2049, sample length: 3429 [default0]:Skipping sample id=2736504. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2484082. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2713888. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2736211. Maximum sequence length: 2049, sample length: 3950 [default0]:Skipping sample id=2746019. Maximum sequence length: 2049, sample length: 3613 [default0]:Skipping sample id=2736688. Maximum sequence length: 2049, sample length: 4123 [default0]:Skipping sample id=2731289. Maximum sequence length: 2049, sample length: 3573 [default0]:Skipping sample id=2733881. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2735236. Maximum sequence length: 2049, sample length: 2850 [default0]:Skipping sample id=2754750. Maximum sequence length: 2049, sample length: 4500 [default0]:Skipping sample id=2716894. Maximum sequence length: 2049, sample length: 2768 [default0]:Skipping sample id=2742045. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2724298. Maximum sequence length: 2049, sample length: 3722 [default0]:Skipping sample id=2714940. Maximum sequence length: 2049, sample length: 4535 [default0]:Skipping sample id=2730169. Maximum sequence length: 2049, sample length: 3694 [default0]:Skipping sample id=2491076. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2736224. Maximum sequence length: 2049, sample length: 3129 [default0]:Skipping sample id=2730976. Maximum sequence length: 2049, sample length: 4726 [default0]:Skipping sample id=2736868. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2720475. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2751742. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2731116. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2732270. Maximum sequence length: 2049, sample length: 2753 [default0]:Skipping sample id=2722642. Maximum sequence length: 2049, sample length: 3772 [default0]:Skipping sample id=2750333. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2730375. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2727572. Maximum sequence length: 2049, sample length: 3380 [default0]:Skipping sample id=2715779. Maximum sequence length: 2049, sample length: 3306 [default0]:Skipping sample id=2742146. Maximum sequence length: 2049, sample length: 4620 [default0]:Skipping sample id=2728365. Maximum sequence length: 2049, sample length: 3905 [default0]:Skipping sample id=2740351. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2490636. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2720477. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2725635. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2494929. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2721057. Maximum sequence length: 2049, sample length: 4567 [default0]:Skipping sample id=2743345. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2747077. Maximum sequence length: 2049, sample length: 3526 [default0]:Skipping sample id=2715924. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2723195. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2714594. Maximum sequence length: 2049, sample length: 3119 [default0]:Skipping sample id=2717568. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2743788. Maximum sequence length: 2049, sample length: 3301 [default0]:Skipping sample id=2714003. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2733785. Maximum sequence length: 2049, sample length: 3471 [default0]:Skipping sample id=2720672. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2482023. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2737260. Maximum sequence length: 2049, sample length: 3823 [default0]:Skipping sample id=2719026. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2757069. Maximum sequence length: 2049, sample length: 3212 [default0]:Skipping sample id=2715273. Maximum sequence length: 2049, sample length: 3305 [default0]:Skipping sample id=2749752. Maximum sequence length: 2049, sample length: 2598 [default0]:Skipping sample id=2738898. Maximum sequence length: 2049, sample length: 2846 [default0]:Skipping sample id=2737662. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2732359. Maximum sequence length: 2049, sample length: 3765 [default0]:Skipping sample id=2733005. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2718356. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2745548. Maximum sequence length: 2049, sample length: 3313 [default0]:Skipping sample id=2740569. Maximum sequence length: 2049, sample length: 4530 [default0]:Skipping sample id=2730926. Maximum sequence length: 2049, sample length: 3749 [default0]:Skipping sample id=2717121. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2739374. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2712589. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2721161. Maximum sequence length: 2049, sample length: 4134 [default0]:Skipping sample id=2718494. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2725686. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2737450. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2749975. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2731072. Maximum sequence length: 2049, sample length: 6335 [default0]:Skipping sample id=2744019. Maximum sequence length: 2049, sample length: 3469 [default0]:Skipping sample id=2738583. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2494355. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2739171. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2495732. Maximum sequence length: 2049, sample length: 2489 [default0]:Skipping sample id=2754527. Maximum sequence length: 2049, sample length: 3547 [default0]:Skipping sample id=2741436. Maximum sequence length: 2049, sample length: 3124 [default0]:Skipping sample id=2747738. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2483748. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2711225. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2713910. Maximum sequence length: 2049, sample length: 4091 [default0]:Skipping sample id=2728324. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2739060. Maximum sequence length: 2049, sample length: 3130 [default0]:Skipping sample id=2487324. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2740909. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2721621. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2742966. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2487487. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2720685. Maximum sequence length: 2049, sample length: 3439 [default0]:Skipping sample id=2488506. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2740253. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2718178. Maximum sequence length: 2049, sample length: 3942 [default0]:Skipping sample id=2739941. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2498254. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2742616. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2747441. Maximum sequence length: 2049, sample length: 3429 [default0]:Skipping sample id=2726730. Maximum sequence length: 2049, sample length: 3835 [default0]:Skipping sample id=2729895. Maximum sequence length: 2049, sample length: 4416 [default0]:Skipping sample id=2730735. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2489384. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2470045. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2727785. Maximum sequence length: 2049, sample length: 3253 [default0]:Skipping sample id=2752556. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2734030. Maximum sequence length: 2049, sample length: 2788 [default0]:Skipping sample id=2488395. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2756956. Maximum sequence length: 2049, sample length: 3102 [default0]:Skipping sample id=2754780. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2737378. Maximum sequence length: 2049, sample length: 4306 [default0]:Skipping sample id=2751147. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2750846. Maximum sequence length: 2049, sample length: 3358 [default0]:Skipping sample id=2753645. Maximum sequence length: 2049, sample length: 4347 [default0]:Skipping sample id=2753629. Maximum sequence length: 2049, sample length: 3160 [default0]:Skipping sample id=2749378. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2736462. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2495192. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2736295. Maximum sequence length: 2049, sample length: 3296 [default0]:Skipping sample id=2742864. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2713216. Maximum sequence length: 2049, sample length: 3468 [default0]:Skipping sample id=2495307. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2734015. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2749767. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2726552. Maximum sequence length: 2049, sample length: 5816 [default0]:Skipping sample id=2729380. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2737803. Maximum sequence length: 2049, sample length: 3695 [default0]:Skipping sample id=2735101. Maximum sequence length: 2049, sample length: 2989 [default0]:Skipping sample id=2738239. Maximum sequence length: 2049, sample length: 2780 [default0]:Skipping sample id=2721340. Maximum sequence length: 2049, sample length: 2502 [default0]:Skipping sample id=2494330. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2746326. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2745577. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2496067. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2470534. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2716193. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2495696. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2746193. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2479996. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2752239. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2722443. Maximum sequence length: 2049, sample length: 3397 [default0]:Skipping sample id=2713801. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2717948. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2712317. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2724702. Maximum sequence length: 2049, sample length: 3028 [default0]:Skipping sample id=2732329. Maximum sequence length: 2049, sample length: 7273 [default0]:Skipping sample id=2732280. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2467217. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2747932. Maximum sequence length: 2049, sample length: 3334 [default0]:Skipping sample id=2743765. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2482525. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2490070. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2497311. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2492758. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2734203. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2738232. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2742824. Maximum sequence length: 2049, sample length: 2835 [default0]:Skipping sample id=2717138. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2721540. Maximum sequence length: 2049, sample length: 2624 [default0]:Skipping sample id=2746721. Maximum sequence length: 2049, sample length: 4206 [default0]:Skipping sample id=2753782. Maximum sequence length: 2049, sample length: 3060 [default0]:Skipping sample id=2750264. Maximum sequence length: 2049, sample length: 2874 [default0]:Skipping sample id=2748106. Maximum sequence length: 2049, sample length: 5809 [default0]:Skipping sample id=2724907. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2747816. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2730311. Maximum sequence length: 2049, sample length: 4587 [default0]:Skipping sample id=2740801. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2466631. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2720646. Maximum sequence length: 2049, sample length: 5853 [default0]:Skipping sample id=2715068. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2746202. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2756565. Maximum sequence length: 2049, sample length: 4872 [default0]:Skipping sample id=2750121. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2490424. Maximum sequence length: 2049, sample length: 3009 [default0]:Skipping sample id=2755908. Maximum sequence length: 2049, sample length: 4248 [default0]:Skipping sample id=2724515. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2730142. Maximum sequence length: 2049, sample length: 2904 [default0]:Skipping sample id=2715657. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2714243. Maximum sequence length: 2049, sample length: 5207 [default0]:Skipping sample id=2477889. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2749873. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2741201. Maximum sequence length: 2049, sample length: 3341 [default0]:Skipping sample id=2736522. Maximum sequence length: 2049, sample length: 2421 [default0]:Skipping sample id=2717784. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2714287. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2741824. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2732222. Maximum sequence length: 2049, sample length: 3355 [default0]:Skipping sample id=2725603. Maximum sequence length: 2049, sample length: 2978 [default0]:Skipping sample id=2713374. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2482998. Maximum sequence length: 2049, sample length: 3167 [default0]:Skipping sample id=2713241. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2752075. Maximum sequence length: 2049, sample length: 4129 [default0]:Skipping sample id=2714720. Maximum sequence length: 2049, sample length: 2860 [default0]:Skipping sample id=2715405. Maximum sequence length: 2049, sample length: 2869 [default0]:Skipping sample id=2498615. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2722431. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2717473. Maximum sequence length: 2049, sample length: 4919 [default0]:Skipping sample id=2717950. Maximum sequence length: 2049, sample length: 4014 [default0]:Skipping sample id=2481909. Maximum sequence length: 2049, sample length: 2808 [default0]:Skipping sample id=2751329. Maximum sequence length: 2049, sample length: 3999 [default0]:Skipping sample id=2735629. Maximum sequence length: 2049, sample length: 2506 [default0]:Skipping sample id=2719654. Maximum sequence length: 2049, sample length: 3882 [default0]:Skipping sample id=2731049. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2467806. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2748586. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2749705. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2492532. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2735116. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2742193. Maximum sequence length: 2049, sample length: 6523 [default0]:Skipping sample id=2718738. Maximum sequence length: 2049, sample length: 3355 [default0]:Skipping sample id=2731084. Maximum sequence length: 2049, sample length: 2712 [default0]:Skipping sample id=2743176. Maximum sequence length: 2049, sample length: 2864 [default0]:Skipping sample id=2720517. Maximum sequence length: 2049, sample length: 6431 [default0]:Skipping sample id=2711256. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2730600. Maximum sequence length: 2049, sample length: 3823 [default0]:Skipping sample id=2752895. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2751973. Maximum sequence length: 2049, sample length: 3230 [default0]:Skipping sample id=2744279. Maximum sequence length: 2049, sample length: 3142 [default0]:Skipping sample id=2738216. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2722715. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2739596. Maximum sequence length: 2049, sample length: 4420 [default0]:Skipping sample id=2712186. Maximum sequence length: 2049, sample length: 7105 [default0]:Skipping sample id=2485521. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2740407. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2486731. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2479899. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2481998. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2721693. Maximum sequence length: 2049, sample length: 3831 [default0]:Skipping sample id=2756020. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2717359. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2729406. Maximum sequence length: 2049, sample length: 3960 [default0]:Skipping sample id=2741501. Maximum sequence length: 2049, sample length: 3327 [default0]:Skipping sample id=2733277. Maximum sequence length: 2049, sample length: 3686 [default0]:Skipping sample id=2742247. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2745786. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2725589. Maximum sequence length: 2049, sample length: 3561 [default0]:Skipping sample id=2725127. Maximum sequence length: 2049, sample length: 3682 [default0]:Skipping sample id=2746955. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2736206. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2746766. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2749805. Maximum sequence length: 2049, sample length: 2824 [default0]:Skipping sample id=2728476. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2733421. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2733321. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2738734. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2716964. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2729052. Maximum sequence length: 2049, sample length: 3626 [default0]:Skipping sample id=2741842. Maximum sequence length: 2049, sample length: 3798 [default0]:Skipping sample id=2496665. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2744453. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2726390. Maximum sequence length: 2049, sample length: 3793 [default0]:Skipping sample id=2736904. Maximum sequence length: 2049, sample length: 4357 [default0]:Skipping sample id=2727126. Maximum sequence length: 2049, sample length: 3420 [default0]:Skipping sample id=2719527. Maximum sequence length: 2049, sample length: 3957 [default0]:Skipping sample id=2756671. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2739423. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2481962. Maximum sequence length: 2049, sample length: 2892 [default0]:Skipping sample id=2735408. Maximum sequence length: 2049, sample length: 3578 [default0]:Skipping sample id=2712861. Maximum sequence length: 2049, sample length: 3137 [default0]:Skipping sample id=2734717. Maximum sequence length: 2049, sample length: 4693 [default0]:Skipping sample id=2713778. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2718736. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2736000. Maximum sequence length: 2049, sample length: 5383 [default0]:Skipping sample id=2494456. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2754532. Maximum sequence length: 2049, sample length: 4130 [default0]:Skipping sample id=2739650. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2716080. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2724678. Maximum sequence length: 2049, sample length: 2935 [default0]:Skipping sample id=2734445. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2466568. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2735108. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2718284. Maximum sequence length: 2049, sample length: 7210 [default0]:Skipping sample id=2740426. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2751801. Maximum sequence length: 2049, sample length: 4423 [default0]:Skipping sample id=2482539. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2737725. Maximum sequence length: 2049, sample length: 3080 [default0]:Skipping sample id=2724858. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2739134. Maximum sequence length: 2049, sample length: 3064 [default0]:Skipping sample id=2725678. Maximum sequence length: 2049, sample length: 4608 [default0]:Skipping sample id=2489594. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2719595. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2731308. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2715899. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2730656. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2714630. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2716354. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2731643. Maximum sequence length: 2049, sample length: 3341 [default0]:Skipping sample id=2490888. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2478350. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2734728. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2749135. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2747611. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2755857. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2712338. Maximum sequence length: 2049, sample length: 5747 [default0]:Skipping sample id=2726267. Maximum sequence length: 2049, sample length: 4900 [default0]:Skipping sample id=2713590. Maximum sequence length: 2049, sample length: 3980 [default0]:Skipping sample id=2748250. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2752604. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2719341. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2491020. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2716989. Maximum sequence length: 2049, sample length: 5135 [default0]:Skipping sample id=2731985. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2740477. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2724595. Maximum sequence length: 2049, sample length: 5561 [default0]:Skipping sample id=2752281. Maximum sequence length: 2049, sample length: 5003 [default0]:Skipping sample id=2720570. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2725794. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2751540. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2480178. Maximum sequence length: 2049, sample length: 3385 [default0]:Skipping sample id=2751268. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2477848. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2736428. Maximum sequence length: 2049, sample length: 2986 [default0]:Skipping sample id=2729911. Maximum sequence length: 2049, sample length: 3550 [default0]:Skipping sample id=2724821. Maximum sequence length: 2049, sample length: 3094 [default0]:Skipping sample id=2479274. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2731723. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2748478. Maximum sequence length: 2049, sample length: 2683 [default0]:Skipping sample id=2717642. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2752488. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2711363. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2755035. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2726863. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2729514. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2478761. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2717067. Maximum sequence length: 2049, sample length: 2900 [default0]:Skipping sample id=2728879. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2737609. Maximum sequence length: 2049, sample length: 3945 [default0]:Skipping sample id=2492992. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2756749. Maximum sequence length: 2049, sample length: 3477 [default0]:Skipping sample id=2724531. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2744466. Maximum sequence length: 2049, sample length: 4920 [default0]:Skipping sample id=2717028. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2756371. Maximum sequence length: 2049, sample length: 2965 [default0]:Skipping sample id=2743175. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2727789. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2747956. Maximum sequence length: 2049, sample length: 3029 [default0]:Skipping sample id=2736619. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2750835. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2733922. Maximum sequence length: 2049, sample length: 2987 [default0]:Skipping sample id=2477525. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2741822. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2721734. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2740095. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2751875. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2735486. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2737537. Maximum sequence length: 2049, sample length: 2870 [default0]:Skipping sample id=2733110. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2721239. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2719729. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2751980. Maximum sequence length: 2049, sample length: 2574 [default0]:Skipping sample id=2738114. Maximum sequence length: 2049, sample length: 4619 [default0]:Skipping sample id=2748853. Maximum sequence length: 2049, sample length: 5278 [default0]:Skipping sample id=2729816. Maximum sequence length: 2049, sample length: 6455 [default0]:Skipping sample id=2719810. Maximum sequence length: 2049, sample length: 3696 [default0]:Skipping sample id=2734366. Maximum sequence length: 2049, sample length: 3172 [default0]:Skipping sample id=2747886. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2728292. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2733054. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2732682. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2748957. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2727601. Maximum sequence length: 2049, sample length: 3107 [default0]:Skipping sample id=2749666. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2734353. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2735905. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2752258. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2754498. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2754714. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2719121. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2712827. Maximum sequence length: 2049, sample length: 2708 [default0]:Skipping sample id=2715723. Maximum sequence length: 2049, sample length: 8161 [default0]:Skipping sample id=2733180. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2751954. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2738205. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2725974. Maximum sequence length: 2049, sample length: 3411 [default0]:Skipping sample id=2738040. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2749201. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2749435. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2466094. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2470780. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2478045. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2755289. Maximum sequence length: 2049, sample length: 2234 [default0]:Skipping sample id=2737268. Maximum sequence length: 2049, sample length: 3316 [default0]:Skipping sample id=2721306. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2492990. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2484905. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2751206. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2736690. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2752821. Maximum sequence length: 2049, sample length: 3501 [default0]:Skipping sample id=2729176. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2713696. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2745003. Maximum sequence length: 2049, sample length: 3959 [default0]:Skipping sample id=2716009. Maximum sequence length: 2049, sample length: 4893 [default0]:Skipping sample id=2711760. Maximum sequence length: 2049, sample length: 2489 [default0]:Skipping sample id=2721692. Maximum sequence length: 2049, sample length: 2717 [default0]:Skipping sample id=2731679. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2715646. Maximum sequence length: 2049, sample length: 2232 [default0]:Skipping sample id=2742208. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2719884. Maximum sequence length: 2049, sample length: 2914 [default0]:Skipping sample id=2717310. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2735858. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2728613. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2734309. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2718442. Maximum sequence length: 2049, sample length: 3732 [default0]:Skipping sample id=2722882. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2735169. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2754881. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2732868. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2746368. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2723361. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2753870. Maximum sequence length: 2049, sample length: 3342 [default0]:Skipping sample id=2724339. Maximum sequence length: 2049, sample length: 3266 [default0]:Skipping sample id=2732427. Maximum sequence length: 2049, sample length: 4704 [default0]:Skipping sample id=2727246. Maximum sequence length: 2049, sample length: 2987 [default0]:Skipping sample id=2725099. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2726156. Maximum sequence length: 2049, sample length: 4546 [default0]:Skipping sample id=2730327. Maximum sequence length: 2049, sample length: 4525 [default0]:Skipping sample id=2729729. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2728405. Maximum sequence length: 2049, sample length: 2915 [default0]:Skipping sample id=2493017. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2740083. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2749647. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2743970. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2727630. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2732303. Maximum sequence length: 2049, sample length: 6924 [default0]:Skipping sample id=2722050. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2718892. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2730921. Maximum sequence length: 2049, sample length: 3917 [default0]:Skipping sample id=2726493. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2751195. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2711716. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2727459. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2752697. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2754924. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2742618. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2746144. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2743199. Maximum sequence length: 2049, sample length: 3019 [default0]:Skipping sample id=2754956. Maximum sequence length: 2049, sample length: 3294 [default0]:Skipping sample id=2741535. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2718675. Maximum sequence length: 2049, sample length: 3037 [default0]:Skipping sample id=2482946. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2733091. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2496155. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2715716. Maximum sequence length: 2049, sample length: 3404 [default0]:Skipping sample id=2752835. Maximum sequence length: 2049, sample length: 3080 [default0]:Skipping sample id=2724938. Maximum sequence length: 2049, sample length: 4012 [default0]:Skipping sample id=2470505. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2467113. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2721560. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2748062. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2722720. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2754382. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2470716. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2750948. Maximum sequence length: 2049, sample length: 4375 [default0]:Skipping sample id=2742115. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2720307. Maximum sequence length: 2049, sample length: 5990 [default0]:Skipping sample id=2737587. Maximum sequence length: 2049, sample length: 3114 [default0]:Skipping sample id=2481071. Maximum sequence length: 2049, sample length: 4272 [default0]:Skipping sample id=2720124. Maximum sequence length: 2049, sample length: 2875 [default0]:Skipping sample id=2741922. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2752841. Maximum sequence length: 2049, sample length: 3420 [default0]:Skipping sample id=2738995. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2753349. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2497140. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2716205. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2726627. Maximum sequence length: 2049, sample length: 2955 [default0]:Skipping sample id=2742323. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2735168. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2713624. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2480549. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2738383. Maximum sequence length: 2049, sample length: 2990 [default0]:Skipping sample id=2728433. Maximum sequence length: 2049, sample length: 4078 [default0]:Skipping sample id=2739773. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2733846. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2724077. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2744525. Maximum sequence length: 2049, sample length: 5552 [default0]:Skipping sample id=2746433. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2719151. Maximum sequence length: 2049, sample length: 3606 [default0]:Skipping sample id=2743932. Maximum sequence length: 2049, sample length: 2853 [default0]:Skipping sample id=2727976. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2726153. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2735187. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2713703. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2478532. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2721987. Maximum sequence length: 2049, sample length: 3514 [default0]:Skipping sample id=2745698. Maximum sequence length: 2049, sample length: 4531 [default0]:Skipping sample id=2477273. Maximum sequence length: 2049, sample length: 3071 [default0]:Skipping sample id=2735665. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2488620. Maximum sequence length: 2049, sample length: 3158 [default0]:Skipping sample id=2728523. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2722266. Maximum sequence length: 2049, sample length: 4578 [default0]:Skipping sample id=2466312. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2482919. Maximum sequence length: 2049, sample length: 2676 [default0]:Skipping sample id=2716132. Maximum sequence length: 2049, sample length: 3803 [default0]:Skipping sample id=2747961. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2754740. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2497594. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2731347. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2744140. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2491043. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2726009. Maximum sequence length: 2049, sample length: 4024 [default0]:Skipping sample id=2753462. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2481533. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2477811. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2731839. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2746606. Maximum sequence length: 2049, sample length: 2941 [default0]:Skipping sample id=2714270. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2755677. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2739744. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2717202. Maximum sequence length: 2049, sample length: 3209 [default0]:Skipping sample id=2739888. Maximum sequence length: 2049, sample length: 4949 [default0]:Skipping sample id=2730964. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2750915. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2733845. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2483514. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2714440. Maximum sequence length: 2049, sample length: 6863 [default0]:Skipping sample id=2745291. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2717248. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2722261. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2742382. Maximum sequence length: 2049, sample length: 3301 [default0]:Skipping sample id=2733222. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2713742. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2749775. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2715586. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2736474. Maximum sequence length: 2049, sample length: 3073 [default0]:Skipping sample id=2466746. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2740767. Maximum sequence length: 2049, sample length: 3440 [default0]:Skipping sample id=2756857. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2732278. Maximum sequence length: 2049, sample length: 2598 [default0]:Skipping sample id=2750508. Maximum sequence length: 2049, sample length: 2945 [default0]:Skipping sample id=2711245. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2728098. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2487141. Maximum sequence length: 2049, sample length: 2557 [default0]:Skipping sample id=2721660. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2718886. Maximum sequence length: 2049, sample length: 5583 [default0]:Skipping sample id=2488833. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2756464. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2730717. Maximum sequence length: 2049, sample length: 2575 [default0]:Skipping sample id=2736391. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2754765. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2718066. Maximum sequence length: 2049, sample length: 3209 [default0]:Skipping sample id=2734301. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2742690. Maximum sequence length: 2049, sample length: 3003 [default0]:Skipping sample id=2740138. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2711099. Maximum sequence length: 2049, sample length: 3039 [default0]:Skipping sample id=2756816. Maximum sequence length: 2049, sample length: 4315 [default0]:Skipping sample id=2734108. Maximum sequence length: 2049, sample length: 3248 [default0]:Skipping sample id=2723413. Maximum sequence length: 2049, sample length: 7217 [default0]:Skipping sample id=2722061. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2488935. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2728198. Maximum sequence length: 2049, sample length: 2865 [default0]:Skipping sample id=2729872. Maximum sequence length: 2049, sample length: 4414 [default0]:Skipping sample id=2724769. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2732517. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2720208. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2467198. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2498232. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2737932. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2730940. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2757097. Maximum sequence length: 2049, sample length: 5872 [default0]:Skipping sample id=2729858. Maximum sequence length: 2049, sample length: 3603 [default0]:Skipping sample id=2745843. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2719765. Maximum sequence length: 2049, sample length: 3811 [default0]:Skipping sample id=2721499. Maximum sequence length: 2049, sample length: 3317 [default0]:Skipping sample id=2735165. Maximum sequence length: 2049, sample length: 2468 [default0]:Skipping sample id=2756382. Maximum sequence length: 2049, sample length: 4368 [default0]:Skipping sample id=2728384. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2482360. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2741694. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2722042. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2733261. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2718816. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2719511. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2755806. Maximum sequence length: 2049, sample length: 3318 [default0]:Skipping sample id=2737432. Maximum sequence length: 2049, sample length: 6759 [default0]:Skipping sample id=2738402. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2745471. Maximum sequence length: 2049, sample length: 4357 [default0]:Skipping sample id=2745619. Maximum sequence length: 2049, sample length: 3573 [default0]:Skipping sample id=2755581. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2484169. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2719778. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2747865. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2739424. Maximum sequence length: 2049, sample length: 5021 [default0]:Skipping sample id=2712649. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2714120. Maximum sequence length: 2049, sample length: 3335 [default0]:Skipping sample id=2714436. Maximum sequence length: 2049, sample length: 4719 [default0]:Skipping sample id=2752518. Maximum sequence length: 2049, sample length: 5018 [default0]:Skipping sample id=2751549. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2719142. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2711809. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2729588. Maximum sequence length: 2049, sample length: 4693 [default0]:Skipping sample id=2468394. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2714779. Maximum sequence length: 2049, sample length: 4803 [default0]:Skipping sample id=2717865. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2739549. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2744522. Maximum sequence length: 2049, sample length: 2430 [default0]:Skipping sample id=2742173. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2750475. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2721972. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2726698. Maximum sequence length: 2049, sample length: 2798 [default0]:Skipping sample id=2712679. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2735551. Maximum sequence length: 2049, sample length: 3229 [default0]:Skipping sample id=2756633. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2484279. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2730660. Maximum sequence length: 2049, sample length: 3295 [default0]:Skipping sample id=2725414. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2726004. Maximum sequence length: 2049, sample length: 3235 [default0]:Skipping sample id=2735001. Maximum sequence length: 2049, sample length: 4413 [default0]:Skipping sample id=2725459. Maximum sequence length: 2049, sample length: 4191 [default0]:Skipping sample id=2733041. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2734391. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2727267. Maximum sequence length: 2049, sample length: 3688 [default0]:Skipping sample id=2481808. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2735876. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2755120. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2734111. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2746979. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2719939. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2497937. Maximum sequence length: 2049, sample length: 4274 [default0]:Skipping sample id=2716347. Maximum sequence length: 2049, sample length: 2953 [default0]:Skipping sample id=2750382. Maximum sequence length: 2049, sample length: 2935 [default0]:Skipping sample id=2737392. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2737817. Maximum sequence length: 2049, sample length: 2956 [default0]:Skipping sample id=2735779. Maximum sequence length: 2049, sample length: 3629 [default0]:Skipping sample id=2735926. Maximum sequence length: 2049, sample length: 2958 [default0]:Skipping sample id=2739159. Maximum sequence length: 2049, sample length: 2597 [default0]:Skipping sample id=2486807. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2753964. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2486741. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2732291. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2753346. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2745810. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2719509. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2755420. Maximum sequence length: 2049, sample length: 3044 [default0]:Skipping sample id=2741087. Maximum sequence length: 2049, sample length: 3419 [default0]:Skipping sample id=2756188. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2731712. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2724215. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2756153. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2724251. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2754373. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2737048. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2731058. Maximum sequence length: 2049, sample length: 3311 [default0]:Skipping sample id=2749403. Maximum sequence length: 2049, sample length: 4131 [default0]:Skipping sample id=2713237. Maximum sequence length: 2049, sample length: 2573 [default0]:Skipping sample id=2733849. Maximum sequence length: 2049, sample length: 3971 [default0]:Skipping sample id=2753358. Maximum sequence length: 2049, sample length: 3134 [default0]:Skipping sample id=2736407. Maximum sequence length: 2049, sample length: 3678 [default0]:Skipping sample id=2755864. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2724037. Maximum sequence length: 2049, sample length: 5128 [default0]:Skipping sample id=2710981. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2720706. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2721204. Maximum sequence length: 2049, sample length: 3158 [default0]:Skipping sample id=2731612. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2740033. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2756021. Maximum sequence length: 2049, sample length: 3585 [default0]:Skipping sample id=2732471. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2485955. Maximum sequence length: 2049, sample length: 3323 [default0]:Skipping sample id=2750225. Maximum sequence length: 2049, sample length: 3018 [default0]:Skipping sample id=2752299. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2748209. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2719871. Maximum sequence length: 2049, sample length: 2712 [default0]:Skipping sample id=2481273. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2750670. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2733117. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2754019. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2713123. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2730279. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2465922. Maximum sequence length: 2049, sample length: 2521 [default0]:Skipping sample id=2715711. Maximum sequence length: 2049, sample length: 4211 [default0]:Skipping sample id=2755072. Maximum sequence length: 2049, sample length: 3604 [default0]:Skipping sample id=2739510. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2498056. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2747668. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2491967. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2741814. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2728685. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2733431. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2734188. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2751226. Maximum sequence length: 2049, sample length: 5494 [default0]:Skipping sample id=2487371. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2711634. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2748609. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2715992. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2743412. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2719896. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2726993. Maximum sequence length: 2049, sample length: 3320 [default0]:Skipping sample id=2756955. Maximum sequence length: 2049, sample length: 2956 [default0]:Skipping sample id=2723052. Maximum sequence length: 2049, sample length: 4549 [default0]:Skipping sample id=2735787. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2490382. Maximum sequence length: 2049, sample length: 2910 [default0]:Skipping sample id=2712326. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2721611. Maximum sequence length: 2049, sample length: 3875 [default0]:Skipping sample id=2750007. Maximum sequence length: 2049, sample length: 4474 [default0]:Skipping sample id=2747359. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2490095. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2719115. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2727958. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2727286. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2747170. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2756247. Maximum sequence length: 2049, sample length: 3305 [default0]:Skipping sample id=2729012. Maximum sequence length: 2049, sample length: 3895 [default0]:Skipping sample id=2726114. Maximum sequence length: 2049, sample length: 3441 [default0]:Skipping sample id=2491972. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2756679. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2741840. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2753191. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2712029. Maximum sequence length: 2049, sample length: 3405 [default0]:Skipping sample id=2727272. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2732558. Maximum sequence length: 2049, sample length: 3836 [default0]:Skipping sample id=2492966. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2724015. Maximum sequence length: 2049, sample length: 5590 [default0]:Skipping sample id=2747098. Maximum sequence length: 2049, sample length: 4354 [default0]:Skipping sample id=2733343. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2755707. Maximum sequence length: 2049, sample length: 2498 [default0]:Skipping sample id=2728436. Maximum sequence length: 2049, sample length: 3932 [default0]:Skipping sample id=2711309. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2720321. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2735106. Maximum sequence length: 2049, sample length: 3440 [default0]:Skipping sample id=2730588. Maximum sequence length: 2049, sample length: 4174 [default0]:Skipping sample id=2751719. Maximum sequence length: 2049, sample length: 2953 [default0]:Skipping sample id=2727165. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2478149. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2723650. Maximum sequence length: 2049, sample length: 3472 [default0]:Skipping sample id=2496213. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2491240. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2741326. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2721493. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2728401. Maximum sequence length: 2049, sample length: 3398 [default0]:Skipping sample id=2751781. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2721657. Maximum sequence length: 2049, sample length: 3293 [default0]:Skipping sample id=2725498. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2713679. Maximum sequence length: 2049, sample length: 3146 [default0]:Skipping sample id=2477467. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2736496. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2713849. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2715179. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2719823. Maximum sequence length: 2049, sample length: 3303 [default0]:Skipping sample id=2724072. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2713739. Maximum sequence length: 2049, sample length: 4127 [default0]:Skipping sample id=2725476. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2734475. Maximum sequence length: 2049, sample length: 3211 [default0]:Skipping sample id=2730442. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2467805. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2482884. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2743323. Maximum sequence length: 2049, sample length: 4599 [default0]:Skipping sample id=2717280. Maximum sequence length: 2049, sample length: 5070 [default0]:Skipping sample id=2742943. Maximum sequence length: 2049, sample length: 3345 [default0]:Skipping sample id=2712750. Maximum sequence length: 2049, sample length: 3173 [default0]:Skipping sample id=2749532. Maximum sequence length: 2049, sample length: 4382 [default0]:Skipping sample id=2740245. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2737263. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2731236. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2724865. Maximum sequence length: 2049, sample length: 4992 [default0]:Skipping sample id=2732630. Maximum sequence length: 2049, sample length: 5456 [default0]:Skipping sample id=2720152. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2719191. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2722488. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2489583. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2722589. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2756910. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2736729. Maximum sequence length: 2049, sample length: 3964 [default0]:Skipping sample id=2496464. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2750980. Maximum sequence length: 2049, sample length: 7562 [default0]:Skipping sample id=2731491. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2733687. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2483191. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2716033. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2712694. Maximum sequence length: 2049, sample length: 4252 [default0]:Skipping sample id=2753754. Maximum sequence length: 2049, sample length: 3246 [default0]:Skipping sample id=2734747. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2498154. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2735406. Maximum sequence length: 2049, sample length: 3083 [default0]:Skipping sample id=2751733. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2737703. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2720092. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2735767. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2742417. Maximum sequence length: 2049, sample length: 3969 [default0]:Skipping sample id=2720019. Maximum sequence length: 2049, sample length: 3093 [default0]:Skipping sample id=2749836. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2751078. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2723077. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2720308. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2753247. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2711873. Maximum sequence length: 2049, sample length: 3020 [default0]:Skipping sample id=2716419. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2715207. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2720727. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2726643. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2498164. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2750774. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2718797. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2483360. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2732837. Maximum sequence length: 2049, sample length: 6482 [default0]:Skipping sample id=2480475. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2715570. Maximum sequence length: 2049, sample length: 4024 [default0]:Skipping sample id=2734352. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2727739. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2738500. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2738388. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2751022. Maximum sequence length: 2049, sample length: 5437 [default0]:Skipping sample id=2734189. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2726308. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2725401. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2714398. Maximum sequence length: 2049, sample length: 3957 [default0]:Skipping sample id=2748229. Maximum sequence length: 2049, sample length: 3136 [default0]:Skipping sample id=2495046. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2727415. Maximum sequence length: 2049, sample length: 3161 [default0]:Skipping sample id=2715178. Maximum sequence length: 2049, sample length: 5990 [default0]:Skipping sample id=2751164. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2720635. Maximum sequence length: 2049, sample length: 3162 [default0]:Skipping sample id=2716530. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2471117. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2714972. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2743423. Maximum sequence length: 2049, sample length: 2656 [default0]:Skipping sample id=2751118. Maximum sequence length: 2049, sample length: 3307 [default0]:Skipping sample id=2723842. Maximum sequence length: 2049, sample length: 3454 [default0]:Skipping sample id=2738212. Maximum sequence length: 2049, sample length: 3325 [default0]:Skipping sample id=2724645. Maximum sequence length: 2049, sample length: 2910 [default0]:Skipping sample id=2724222. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2724107. Maximum sequence length: 2049, sample length: 2740 [default0]:Skipping sample id=2740861. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2726036. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2728419. Maximum sequence length: 2049, sample length: 3521 [default0]:Skipping sample id=2467707. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2751080. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2711572. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2720240. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2731802. Maximum sequence length: 2049, sample length: 4504 [default0]:Skipping sample id=2736397. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2753306. Maximum sequence length: 2049, sample length: 2421 [default0]:Skipping sample id=2726759. Maximum sequence length: 2049, sample length: 4893 [default0]:Skipping sample id=2711018. Maximum sequence length: 2049, sample length: 5013 [default0]:Skipping sample id=2725837. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2729060. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2755012. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2469804. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2498656. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2717218. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2754406. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2484998. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2734757. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2743711. Maximum sequence length: 2049, sample length: 3143 [default0]:Skipping sample id=2719172. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2730470. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2470675. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2745282. Maximum sequence length: 2049, sample length: 4289 [default0]:Skipping sample id=2730258. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2731253. Maximum sequence length: 2049, sample length: 3202 [default0]:Skipping sample id=2721623. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2736759. Maximum sequence length: 2049, sample length: 3295 [default0]:Skipping sample id=2497370. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2743496. Maximum sequence length: 2049, sample length: 3241 [default0]:Skipping sample id=2720846. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2722608. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2735668. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2720922. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2755517. Maximum sequence length: 2049, sample length: 4295 [default0]:Skipping sample id=2727357. Maximum sequence length: 2049, sample length: 3989 [default0]:Skipping sample id=2753646. Maximum sequence length: 2049, sample length: 2846 [default0]:Skipping sample id=2754507. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2752228. Maximum sequence length: 2049, sample length: 2568 [default0]:Skipping sample id=2711066. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2731966. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2467525. Maximum sequence length: 2049, sample length: 2671 [default0]:Skipping sample id=2746412. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2489776. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2722774. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2717624. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2751471. Maximum sequence length: 2049, sample length: 4049 [default0]:Skipping sample id=2482957. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2720421. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2712244. Maximum sequence length: 2049, sample length: 2998 [default0]:Skipping sample id=2753538. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2730663. Maximum sequence length: 2049, sample length: 3021 [default0]:Skipping sample id=2721661. Maximum sequence length: 2049, sample length: 4437 [default0]:Skipping sample id=2723015. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2755295. Maximum sequence length: 2049, sample length: 6560 [default0]:Skipping sample id=2714578. Maximum sequence length: 2049, sample length: 3778 [default0]:Skipping sample id=2719606. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2718907. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2725083. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2715681. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2497675. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2714884. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2735868. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2716944. Maximum sequence length: 2049, sample length: 2601 [default0]:Skipping sample id=2497580. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2728569. Maximum sequence length: 2049, sample length: 4020 [default0]:Skipping sample id=2745050. Maximum sequence length: 2049, sample length: 3075 [default0]:Skipping sample id=2736159. Maximum sequence length: 2049, sample length: 3416 [default0]:Skipping sample id=2469451. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2731223. Maximum sequence length: 2049, sample length: 3646 [default0]:Skipping sample id=2755780. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2722489. Maximum sequence length: 2049, sample length: 5524 [default0]:Skipping sample id=2714199. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2486067. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2741338. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2745405. Maximum sequence length: 2049, sample length: 4038 [default0]:Skipping sample id=2718503. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2749832. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2728225. Maximum sequence length: 2049, sample length: 3474 [default0]:Skipping sample id=2740081. Maximum sequence length: 2049, sample length: 4553 [default0]:Skipping sample id=2752163. Maximum sequence length: 2049, sample length: 4676 [default0]:Skipping sample id=2719233. Maximum sequence length: 2049, sample length: 4572 [default0]:Skipping sample id=2496681. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2740964. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2752279. Maximum sequence length: 2049, sample length: 4527 [default0]:Skipping sample id=2499398. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2735488. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2729325. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2494940. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2727632. Maximum sequence length: 2049, sample length: 3466 [default0]:Skipping sample id=2745744. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2717548. Maximum sequence length: 2049, sample length: 3643 [default0]:Skipping sample id=2482408. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2733259. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2722701. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2720916. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2719580. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2741141. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2467249. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2756925. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2743464. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2732263. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2717321. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2722451. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2712791. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2495497. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2729999. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2749820. Maximum sequence length: 2049, sample length: 3899 [default0]:Skipping sample id=2737568. Maximum sequence length: 2049, sample length: 3947 [default0]:Skipping sample id=2721815. Maximum sequence length: 2049, sample length: 4119 [default0]:Skipping sample id=2747657. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2728305. Maximum sequence length: 2049, sample length: 4924 [default0]:Skipping sample id=2742209. Maximum sequence length: 2049, sample length: 3675 [default0]:Skipping sample id=2487897. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2752689. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2493717. Maximum sequence length: 2049, sample length: 3097 [default0]:Skipping sample id=2736155. Maximum sequence length: 2049, sample length: 2990 [default0]:Skipping sample id=2744344. Maximum sequence length: 2049, sample length: 3734 [default0]:Skipping sample id=2752765. Maximum sequence length: 2049, sample length: 4313 [default0]:Skipping sample id=2723600. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2754480. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2720741. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2467015. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2745763. Maximum sequence length: 2049, sample length: 3580 [default0]:Skipping sample id=2495216. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2716687. Maximum sequence length: 2049, sample length: 4718 [default0]:Skipping sample id=2755620. Maximum sequence length: 2049, sample length: 3106 [default0]:Skipping sample id=2743692. Maximum sequence length: 2049, sample length: 4771 [default0]:Skipping sample id=2719517. Maximum sequence length: 2049, sample length: 3151 [default0]:Skipping sample id=2743236. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2741073. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2746128. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2745819. Maximum sequence length: 2049, sample length: 4369 [default0]:Skipping sample id=2751885. Maximum sequence length: 2049, sample length: 2640 [default0]:Skipping sample id=2754789. Maximum sequence length: 2049, sample length: 5448 [default0]:Skipping sample id=2724421. Maximum sequence length: 2049, sample length: 2908 [default0]:Skipping sample id=2734051. Maximum sequence length: 2049, sample length: 4158 [default0]:Skipping sample id=2735734. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2487486. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2720763. Maximum sequence length: 2049, sample length: 4233 [default0]:Skipping sample id=2750885. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2725757. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2716565. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2741110. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2733602. Maximum sequence length: 2049, sample length: 3671 [default0]:Skipping sample id=2484813. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2732811. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2747537. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2755356. Maximum sequence length: 2049, sample length: 4108 [default0]:Skipping sample id=2748146. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2744034. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2749045. Maximum sequence length: 2049, sample length: 3301 [default0]:Skipping sample id=2492236. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2753501. Maximum sequence length: 2049, sample length: 4625 [default0]:Skipping sample id=2479719. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2729511. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2496306. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2731919. Maximum sequence length: 2049, sample length: 4420 [default0]:Skipping sample id=2750324. Maximum sequence length: 2049, sample length: 3370 [default0]:Skipping sample id=2753018. Maximum sequence length: 2049, sample length: 4941 [default0]:Skipping sample id=2739349. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2478356. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2728544. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2724501. Maximum sequence length: 2049, sample length: 5214 [default0]:Skipping sample id=2746700. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2720637. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2741513. Maximum sequence length: 2049, sample length: 2973 [default0]:Skipping sample id=2715282. Maximum sequence length: 2049, sample length: 2952 [default0]:Skipping sample id=2737728. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2741129. Maximum sequence length: 2049, sample length: 5264 [default0]:Skipping sample id=2496333. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2727795. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2746469. Maximum sequence length: 2049, sample length: 3699 [default0]:Skipping sample id=2744648. Maximum sequence length: 2049, sample length: 4443 [default0]:Skipping sample id=2733200. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2750162. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2483113. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2713802. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2719910. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2712395. Maximum sequence length: 2049, sample length: 3284 [default0]:Skipping sample id=2746886. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2734318. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2729366. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2753294. Maximum sequence length: 2049, sample length: 4158 [default0]:Skipping sample id=2731516. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2711214. Maximum sequence length: 2049, sample length: 2937 [default0]:Skipping sample id=2736059. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2725588. Maximum sequence length: 2049, sample length: 2592 [default0]:Skipping sample id=2747781. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2744186. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2749597. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2747950. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2722098. Maximum sequence length: 2049, sample length: 3013 [default0]:Skipping sample id=2725014. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2718409. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2746471. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2724264. Maximum sequence length: 2049, sample length: 4155 [default0]:Skipping sample id=2743312. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2727499. Maximum sequence length: 2049, sample length: 5133 [default0]:Skipping sample id=2736998. Maximum sequence length: 2049, sample length: 3792 [default0]:Skipping sample id=2715121. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2740275. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2732545. Maximum sequence length: 2049, sample length: 3395 [default0]:Skipping sample id=2740820. Maximum sequence length: 2049, sample length: 2865 [default0]:Skipping sample id=2717678. Maximum sequence length: 2049, sample length: 4764 [default0]:Skipping sample id=2713783. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2734022. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2745661. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2743891. Maximum sequence length: 2049, sample length: 6434 [default0]:Skipping sample id=2751322. Maximum sequence length: 2049, sample length: 2758 [default0]:Skipping sample id=2747150. Maximum sequence length: 2049, sample length: 4529 [default0]:Skipping sample id=2737563. Maximum sequence length: 2049, sample length: 3611 [default0]:Skipping sample id=2735110. Maximum sequence length: 2049, sample length: 3109 [default0]:Skipping sample id=2746851. Maximum sequence length: 2049, sample length: 2567 [default0]:Skipping sample id=2713744. Maximum sequence length: 2049, sample length: 3325 [default0]:Skipping sample id=2481020. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2711045. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2739178. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2729599. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2730096. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2483877. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2727241. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2743738. Maximum sequence length: 2049, sample length: 3604 [default0]:Skipping sample id=2727434. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2466873. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2724933. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2743766. Maximum sequence length: 2049, sample length: 5147 [default0]:Skipping sample id=2748258. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2749853. Maximum sequence length: 2049, sample length: 2553 [default0]:Skipping sample id=2728242. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2734982. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2729188. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2756658. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2711631. Maximum sequence length: 2049, sample length: 4188 [default0]:Skipping sample id=2482156. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2756134. Maximum sequence length: 2049, sample length: 4129 [default0]:Skipping sample id=2719392. Maximum sequence length: 2049, sample length: 6639 [default0]:Skipping sample id=2467206. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2484671. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2723531. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2739229. Maximum sequence length: 2049, sample length: 3335 [default0]:Skipping sample id=2714524. Maximum sequence length: 2049, sample length: 3560 [default0]:Skipping sample id=2719490. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2750134. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2738505. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2489136. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2724775. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2716459. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2718290. Maximum sequence length: 2049, sample length: 3678 [default0]:Skipping sample id=2715915. Maximum sequence length: 2049, sample length: 4073 [default0]:Skipping sample id=2493162. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2713669. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2712723. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2751233. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2727782. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2724892. Maximum sequence length: 2049, sample length: 3108 [default0]:Skipping sample id=2756618. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2717112. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2466234. Maximum sequence length: 2049, sample length: 2822 [default0]:Skipping sample id=2737890. Maximum sequence length: 2049, sample length: 3387 [default0]:Skipping sample id=2737635. Maximum sequence length: 2049, sample length: 3127 [default0]:Skipping sample id=2731416. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2732454. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2494567. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2716747. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2741847. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2742300. Maximum sequence length: 2049, sample length: 4501 [default0]:Skipping sample id=2727127. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2728302. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2716128. Maximum sequence length: 2049, sample length: 4371 [default0]:Skipping sample id=2738472. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2746783. Maximum sequence length: 2049, sample length: 4753 [default0]:Skipping sample id=2470503. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2726822. Maximum sequence length: 2049, sample length: 3814 [default0]:Skipping sample id=2728374. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2735963. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2717703. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2729606. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2736485. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2737337. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2739182. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2719058. Maximum sequence length: 2049, sample length: 3175 [default0]:Skipping sample id=2755718. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2487690. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2470526. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2721840. Maximum sequence length: 2049, sample length: 2749 [default0]:Skipping sample id=2732002. Maximum sequence length: 2049, sample length: 5177 [default0]:Skipping sample id=2721101. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2750787. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2482634. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2722093. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2724737. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2714782. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2735792. Maximum sequence length: 2049, sample length: 3021 [default0]:Skipping sample id=2747112. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2716548. Maximum sequence length: 2049, sample length: 6684 [default0]:Skipping sample id=2711163. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2739363. Maximum sequence length: 2049, sample length: 4494 [default0]:Skipping sample id=2747106. Maximum sequence length: 2049, sample length: 3253 [default0]:Skipping sample id=2723235. Maximum sequence length: 2049, sample length: 3286 [default0]:Skipping sample id=2740662. Maximum sequence length: 2049, sample length: 2625 [default0]:Skipping sample id=2734426. Maximum sequence length: 2049, sample length: 4081 [default0]:Skipping sample id=2727692. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2465773. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2736895. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2728393. Maximum sequence length: 2049, sample length: 5032 [default0]:Skipping sample id=2483440. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2719310. Maximum sequence length: 2049, sample length: 5974 [default0]:Skipping sample id=2753827. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2484932. Maximum sequence length: 2049, sample length: 3847 [default0]:Skipping sample id=2740583. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2481344. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2725328. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2481030. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2723773. Maximum sequence length: 2049, sample length: 5265 [default0]:Skipping sample id=2734789. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2733130. Maximum sequence length: 2049, sample length: 2998 [default0]:Skipping sample id=2738293. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2755293. Maximum sequence length: 2049, sample length: 2978 [default0]:Skipping sample id=2723747. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2755419. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2499135. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2728996. Maximum sequence length: 2049, sample length: 3345 [default0]:Skipping sample id=2755977. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2721677. Maximum sequence length: 2049, sample length: 6691 [default0]:Skipping sample id=2489154. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2495359. Maximum sequence length: 2049, sample length: 3272 [default0]:Skipping sample id=2711804. Maximum sequence length: 2049, sample length: 4166 [default0]:Skipping sample id=2722090. Maximum sequence length: 2049, sample length: 5651 [default0]:Skipping sample id=2750968. Maximum sequence length: 2049, sample length: 3722 [default0]:Skipping sample id=2730087. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2743963. Maximum sequence length: 2049, sample length: 6212 [default0]:Skipping sample id=2737989. Maximum sequence length: 2049, sample length: 4687 [default0]:Skipping sample id=2721031. Maximum sequence length: 2049, sample length: 3583 [default0]:Skipping sample id=2748815. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2752797. Maximum sequence length: 2049, sample length: 2593 [default0]:Skipping sample id=2739517. Maximum sequence length: 2049, sample length: 3357 [default0]:Skipping sample id=2735552. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2735873. Maximum sequence length: 2049, sample length: 3430 [default0]:Skipping sample id=2753695. Maximum sequence length: 2049, sample length: 3702 [default0]:Skipping sample id=2722501. Maximum sequence length: 2049, sample length: 2555 [default0]:Skipping sample id=2723173. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2748622. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2740269. Maximum sequence length: 2049, sample length: 4197 [default0]:Skipping sample id=2752779. Maximum sequence length: 2049, sample length: 2669 [default0]:Skipping sample id=2717911. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2725718. Maximum sequence length: 2049, sample length: 4261 [default0]:Skipping sample id=2725975. Maximum sequence length: 2049, sample length: 3614 [default0]:Skipping sample id=2466248. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2735224. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2734545. Maximum sequence length: 2049, sample length: 3294 [default0]:Skipping sample id=2722952. Maximum sequence length: 2049, sample length: 4259 [default0]:Skipping sample id=2726706. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2730828. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2721041. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2750383. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2495475. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2754564. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2746038. Maximum sequence length: 2049, sample length: 3051 [default0]:Skipping sample id=2750873. Maximum sequence length: 2049, sample length: 2616 [default0]:Skipping sample id=2728717. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2734698. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2746014. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2741342. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2742256. Maximum sequence length: 2049, sample length: 3298 [default0]:Skipping sample id=2490840. Maximum sequence length: 2049, sample length: 2649 [default0]:Skipping sample id=2738677. Maximum sequence length: 2049, sample length: 3431 [default0]:Skipping sample id=2731615. Maximum sequence length: 2049, sample length: 3029 [default0]:Skipping sample id=2741640. Maximum sequence length: 2049, sample length: 3105 [default0]:Skipping sample id=2729566. Maximum sequence length: 2049, sample length: 3411 [default0]:Skipping sample id=2722980. Maximum sequence length: 2049, sample length: 3330 [default0]:Skipping sample id=2729567. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2754945. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2715597. Maximum sequence length: 2049, sample length: 5257 [default0]:Skipping sample id=2729500. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2741749. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2741062. Maximum sequence length: 2049, sample length: 4549 [default0]:Skipping sample id=2752263. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2724429. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2724528. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2750002. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2722463. Maximum sequence length: 2049, sample length: 4756 [default0]:Skipping sample id=2725471. Maximum sequence length: 2049, sample length: 3840 [default0]:Skipping sample id=2743501. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2710998. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2725167. Maximum sequence length: 2049, sample length: 3369 [default0]:Skipping sample id=2743733. Maximum sequence length: 2049, sample length: 3596 [default0]:Skipping sample id=2711343. Maximum sequence length: 2049, sample length: 3845 [default0]:Skipping sample id=2717289. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2734097. Maximum sequence length: 2049, sample length: 4232 [default0]:Skipping sample id=2751981. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2745166. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2729154. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2728293. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2728912. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2484750. Maximum sequence length: 2049, sample length: 3451 [default0]:Skipping sample id=2478543. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2746359. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2477789. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2486287. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2479648. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2715389. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2718644. Maximum sequence length: 2049, sample length: 5258 [default0]:Skipping sample id=2719692. Maximum sequence length: 2049, sample length: 3087 [default0]:Skipping sample id=2711879. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2726149. Maximum sequence length: 2049, sample length: 3695 [default0]:Skipping sample id=2752911. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2733743. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2479637. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2730126. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2732702. Maximum sequence length: 2049, sample length: 7095 [default0]:Skipping sample id=2742848. Maximum sequence length: 2049, sample length: 3610 [default0]:Skipping sample id=2711492. Maximum sequence length: 2049, sample length: 4046 [default0]:Skipping sample id=2496839. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2729056. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2738390. Maximum sequence length: 2049, sample length: 3197 [default0]:Skipping sample id=2724370. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2755557. Maximum sequence length: 2049, sample length: 3768 [default0]:Skipping sample id=2726126. Maximum sequence length: 2049, sample length: 3105 [default0]:Skipping sample id=2491689. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2730054. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2494782. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2727176. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2479128. Maximum sequence length: 2049, sample length: 3446 [default0]:Skipping sample id=2711406. Maximum sequence length: 2049, sample length: 4052 [default0]:Skipping sample id=2752435. Maximum sequence length: 2049, sample length: 4369 [default0]:Skipping sample id=2494446. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2717555. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2741651. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2735504. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2466316. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2721312. Maximum sequence length: 2049, sample length: 3180 [default0]:Skipping sample id=2725849. Maximum sequence length: 2049, sample length: 4337 [default0]:Skipping sample id=2733876. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2721373. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2714091. Maximum sequence length: 2049, sample length: 5313 [default0]:Skipping sample id=2748126. Maximum sequence length: 2049, sample length: 5270 [default0]:Skipping sample id=2740702. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2729287. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2755153. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2723931. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2717124. Maximum sequence length: 2049, sample length: 3674 [default0]:Skipping sample id=2747712. Maximum sequence length: 2049, sample length: 3421 [default0]:Skipping sample id=2712624. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2466224. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2743705. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2735133. Maximum sequence length: 2049, sample length: 6604 [default0]:Skipping sample id=2736935. Maximum sequence length: 2049, sample length: 2676 [default0]:Skipping sample id=2720348. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2727956. Maximum sequence length: 2049, sample length: 3591 [default0]:Skipping sample id=2754213. Maximum sequence length: 2049, sample length: 4256 [default0]:Skipping sample id=2723772. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2477497. Maximum sequence length: 2049, sample length: 3244 [default0]:Skipping sample id=2712587. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2468659. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2726369. Maximum sequence length: 2049, sample length: 3393 [default0]:Skipping sample id=2739302. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2468772. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2737532. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2737477. Maximum sequence length: 2049, sample length: 5456 [default0]:Skipping sample id=2713964. Maximum sequence length: 2049, sample length: 2589 [default0]:Skipping sample id=2711702. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2721052. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2753687. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2737888. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2736944. Maximum sequence length: 2049, sample length: 2878 [default0]:Skipping sample id=2752466. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2721694. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2734937. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2750452. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2727275. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2718643. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2726350. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2735736. Maximum sequence length: 2049, sample length: 5954 [default0]:Skipping sample id=2724382. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2743539. Maximum sequence length: 2049, sample length: 3545 [default0]:Skipping sample id=2724580. Maximum sequence length: 2049, sample length: 3084 [default0]:Skipping sample id=2713572. Maximum sequence length: 2049, sample length: 3407 [default0]:Skipping sample id=2735330. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2752816. Maximum sequence length: 2049, sample length: 2672 [default0]:Skipping sample id=2488541. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2732377. Maximum sequence length: 2049, sample length: 3150 [default0]:Skipping sample id=2716364. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2747427. Maximum sequence length: 2049, sample length: 2728 [default0]:Skipping sample id=2721311. Maximum sequence length: 2049, sample length: 3785 [default0]:Skipping sample id=2467867. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2733376. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2752807. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2494317. Maximum sequence length: 2049, sample length: 3517 [default0]:Skipping sample id=2743632. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2747425. Maximum sequence length: 2049, sample length: 5036 [default0]:Skipping sample id=2742654. Maximum sequence length: 2049, sample length: 3505 [default0]:Skipping sample id=2496209. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2746344. Maximum sequence length: 2049, sample length: 3809 [default0]:Skipping sample id=2721472. Maximum sequence length: 2049, sample length: 2944 [default0]:Skipping sample id=2747848. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2730456. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2716231. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2748653. Maximum sequence length: 2049, sample length: 3061 [default0]:Skipping sample id=2744432. Maximum sequence length: 2049, sample length: 3384 [default0]:Skipping sample id=2727553. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2727190. Maximum sequence length: 2049, sample length: 3171 [default0]:Skipping sample id=2741511. Maximum sequence length: 2049, sample length: 3053 [default0]:Skipping sample id=2731559. Maximum sequence length: 2049, sample length: 4235 [default0]:Skipping sample id=2735908. Maximum sequence length: 2049, sample length: 3457 [default0]:Skipping sample id=2712865. Maximum sequence length: 2049, sample length: 3234 [default0]:Skipping sample id=2726804. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2739726. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2731982. Maximum sequence length: 2049, sample length: 5779 [default0]:Skipping sample id=2727011. Maximum sequence length: 2049, sample length: 3670 [default0]:Skipping sample id=2750227. Maximum sequence length: 2049, sample length: 4110 [default0]:Skipping sample id=2723097. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2743068. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2485226. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2729454. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2493882. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2755139. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2732919. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2733140. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2717507. Maximum sequence length: 2049, sample length: 4485 [default0]:Skipping sample id=2745516. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2466891. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2747933. Maximum sequence length: 2049, sample length: 14228 [default0]:Skipping sample id=2753954. Maximum sequence length: 2049, sample length: 4034 [default0]:Skipping sample id=2479138. Maximum sequence length: 2049, sample length: 2914 [default0]:Skipping sample id=2746921. Maximum sequence length: 2049, sample length: 4847 [default0]:Skipping sample id=2730051. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2722145. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2741659. Maximum sequence length: 2049, sample length: 3875 [default0]:Skipping sample id=2750796. Maximum sequence length: 2049, sample length: 6452 [default0]:Skipping sample id=2496765. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2714588. Maximum sequence length: 2049, sample length: 3342 [default0]:Skipping sample id=2736803. Maximum sequence length: 2049, sample length: 4697 [default0]:Skipping sample id=2732140. Maximum sequence length: 2049, sample length: 3670 [default0]:Skipping sample id=2726860. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2740681. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2737281. Maximum sequence length: 2049, sample length: 3374 [default0]:Skipping sample id=2723485. Maximum sequence length: 2049, sample length: 2750 [default0]:Skipping sample id=2727031. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2722046. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2719819. Maximum sequence length: 2049, sample length: 3868 [default0]:Skipping sample id=2719603. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2478932. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2734637. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2755639. Maximum sequence length: 2049, sample length: 4738 [default0]:Skipping sample id=2754556. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2743112. Maximum sequence length: 2049, sample length: 5487 [default0]:Skipping sample id=2493766. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2747762. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2739019. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2725897. Maximum sequence length: 2049, sample length: 3007 [default0]:Skipping sample id=2750377. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2479081. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2725912. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2752956. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2742786. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2747814. Maximum sequence length: 2049, sample length: 4614 [default0]:Skipping sample id=2729213. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2748946. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2712225. Maximum sequence length: 2049, sample length: 2953 [default0]:Skipping sample id=2481706. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2729011. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2741832. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2741392. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2727544. Maximum sequence length: 2049, sample length: 3865 [default0]:Skipping sample id=2746925. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2733959. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2737404. Maximum sequence length: 2049, sample length: 2621 [default0]:Skipping sample id=2740532. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2740349. Maximum sequence length: 2049, sample length: 3738 [default0]:Skipping sample id=2744326. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2716029. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2731810. Maximum sequence length: 2049, sample length: 2586 [default0]:Skipping sample id=2746337. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2735542. Maximum sequence length: 2049, sample length: 3578 [default0]:Skipping sample id=2720178. Maximum sequence length: 2049, sample length: 3835 [default0]:Skipping sample id=2735584. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2755922. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2717630. Maximum sequence length: 2049, sample length: 2904 [default0]:Skipping sample id=2755521. Maximum sequence length: 2049, sample length: 7512 [default0]:Skipping sample id=2754896. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2739239. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2465972. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2717128. Maximum sequence length: 2049, sample length: 3571 [default0]:Skipping sample id=2495409. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2466649. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2718811. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2737997. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2721806. Maximum sequence length: 2049, sample length: 2521 [default0]:Skipping sample id=2745315. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2733351. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2725581. Maximum sequence length: 2049, sample length: 5682 [default0]:Skipping sample id=2750439. Maximum sequence length: 2049, sample length: 3314 [default0]:Skipping sample id=2499052. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2734869. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2722267. Maximum sequence length: 2049, sample length: 2740 [default0]:Skipping sample id=2746047. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2740955. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2750539. Maximum sequence length: 2049, sample length: 3680 [default0]:Skipping sample id=2735196. Maximum sequence length: 2049, sample length: 3995 [default0]:Skipping sample id=2494396. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2712276. Maximum sequence length: 2049, sample length: 2961 [default0]:Skipping sample id=2730177. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2466102. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2741792. Maximum sequence length: 2049, sample length: 4839 [default0]:Skipping sample id=2752738. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2751290. Maximum sequence length: 2049, sample length: 3240 [default0]:Skipping sample id=2489369. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2718771. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2719192. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2731607. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2752539. Maximum sequence length: 2049, sample length: 4603 [default0]:Skipping sample id=2732014. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2712219. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2730144. Maximum sequence length: 2049, sample length: 3819 [default0]:Skipping sample id=2747927. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2720576. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2726628. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2740342. Maximum sequence length: 2049, sample length: 4357 [default0]:Skipping sample id=2753263. Maximum sequence length: 2049, sample length: 5331 [default0]:Skipping sample id=2741040. Maximum sequence length: 2049, sample length: 5337 [default0]:Skipping sample id=2740890. Maximum sequence length: 2049, sample length: 3611 [default0]:Skipping sample id=2730860. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2754333. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2754449. Maximum sequence length: 2049, sample length: 2861 [default0]:Skipping sample id=2731735. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2717968. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2737826. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2498784. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2470237. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2725128. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2752886. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2736871. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2751298. Maximum sequence length: 2049, sample length: 5346 [default0]:Skipping sample id=2731556. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2733239. Maximum sequence length: 2049, sample length: 2765 [default0]:Skipping sample id=2715489. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2734578. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2468533. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2739118. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2717926. Maximum sequence length: 2049, sample length: 4907 [default0]:Skipping sample id=2745606. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2478258. Maximum sequence length: 2049, sample length: 3347 [default0]:Skipping sample id=2741557. Maximum sequence length: 2049, sample length: 2915 [default0]:Skipping sample id=2740066. Maximum sequence length: 2049, sample length: 3361 [default0]:Skipping sample id=2724846. Maximum sequence length: 2049, sample length: 3349 [default0]:Skipping sample id=2731664. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2483417. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2490782. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2719523. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2753230. Maximum sequence length: 2049, sample length: 2862 [default0]:Skipping sample id=2735389. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2744297. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2718713. Maximum sequence length: 2049, sample length: 3447 [default0]:Skipping sample id=2716787. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2722440. Maximum sequence length: 2049, sample length: 4097 [default0]:Skipping sample id=2746544. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2730163. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2752287. Maximum sequence length: 2049, sample length: 4865 [default0]:Skipping sample id=2753254. Maximum sequence length: 2049, sample length: 3114 [default0]:Skipping sample id=2745235. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2715906. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2744678. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2756789. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2716443. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2733570. Maximum sequence length: 2049, sample length: 3288 [default0]:Skipping sample id=2755847. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2732486. Maximum sequence length: 2049, sample length: 3988 [default0]:Skipping sample id=2739216. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2738886. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2499088. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2713537. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2713029. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2489844. Maximum sequence length: 2049, sample length: 3611 [default0]:Skipping sample id=2493615. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2715143. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2733311. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2735847. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2736637. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2754409. Maximum sequence length: 2049, sample length: 3915 [default0]:Skipping sample id=2745156. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2735826. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2713807. Maximum sequence length: 2049, sample length: 4294 [default0]:Skipping sample id=2488211. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2715219. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2751807. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2756402. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2724782. Maximum sequence length: 2049, sample length: 3567 [default0]:Skipping sample id=2489005. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2749934. Maximum sequence length: 2049, sample length: 2490 [default0]:Skipping sample id=2743200. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2745313. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2752385. Maximum sequence length: 2049, sample length: 3797 [default0]:Skipping sample id=2495591. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2746485. Maximum sequence length: 2049, sample length: 3938 [default0]:Skipping sample id=2750277. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2739874. Maximum sequence length: 2049, sample length: 3959 [default0]:Skipping sample id=2753276. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2755685. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2467399. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2731552. Maximum sequence length: 2049, sample length: 2963 [default0]:Skipping sample id=2718837. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2719890. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2466523. Maximum sequence length: 2049, sample length: 2833 [default0]:Skipping sample id=2714709. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2749402. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2750274. Maximum sequence length: 2049, sample length: 2839 [default0]:Skipping sample id=2734008. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2482201. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2720061. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2730917. Maximum sequence length: 2049, sample length: 3307 [default0]:Skipping sample id=2730726. Maximum sequence length: 2049, sample length: 3037 [default0]:Skipping sample id=2467518. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2468361. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2722341. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2742552. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2722405. Maximum sequence length: 2049, sample length: 3616 [default0]:Skipping sample id=2725949. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2750084. Maximum sequence length: 2049, sample length: 5866 [default0]:Skipping sample id=2730452. Maximum sequence length: 2049, sample length: 3959 [default0]:Skipping sample id=2492641. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2492736. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2729039. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2736038. Maximum sequence length: 2049, sample length: 4135 [default0]:Skipping sample id=2753413. Maximum sequence length: 2049, sample length: 5179 [default0]:Skipping sample id=2756311. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2481769. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2478029. Maximum sequence length: 2049, sample length: 4088 [default0]:Skipping sample id=2717726. Maximum sequence length: 2049, sample length: 3228 [default0]:Skipping sample id=2477691. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2752656. Maximum sequence length: 2049, sample length: 3436 [default0]:Skipping sample id=2712886. Maximum sequence length: 2049, sample length: 3746 [default0]:Skipping sample id=2489099. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2723179. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2747692. Maximum sequence length: 2049, sample length: 3503 [default0]:Skipping sample id=2747217. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2725020. Maximum sequence length: 2049, sample length: 2923 [default0]:Skipping sample id=2718809. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2753466. Maximum sequence length: 2049, sample length: 3236 [default0]:Skipping sample id=2735371. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2493203. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2751320. Maximum sequence length: 2049, sample length: 4558 [default0]:Skipping sample id=2741520. Maximum sequence length: 2049, sample length: 2594 [default0]:Skipping sample id=2751497. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2743828. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2720808. Maximum sequence length: 2049, sample length: 3056 [default0]:Skipping sample id=2751443. Maximum sequence length: 2049, sample length: 3898 [default0]:Skipping sample id=2754869. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2718239. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2750220. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2718604. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2716562. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2490533. Maximum sequence length: 2049, sample length: 2756 [default0]:Skipping sample id=2466601. Maximum sequence length: 2049, sample length: 3197 [default0]:Skipping sample id=2733537. Maximum sequence length: 2049, sample length: 6399 [default0]:Skipping sample id=2754892. Maximum sequence length: 2049, sample length: 5110 [default0]:Skipping sample id=2493543. Maximum sequence length: 2049, sample length: 3088 [default0]:Skipping sample id=2730813. Maximum sequence length: 2049, sample length: 4147 [default0]:Skipping sample id=2743182. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2712014. Maximum sequence length: 2049, sample length: 3447 [default0]:Skipping sample id=2478534. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2727379. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2739810. Maximum sequence length: 2049, sample length: 5166 [default0]:Skipping sample id=2745396. Maximum sequence length: 2049, sample length: 4398 [default0]:Skipping sample id=2747765. Maximum sequence length: 2049, sample length: 3525 [default0]:Skipping sample id=2737374. Maximum sequence length: 2049, sample length: 3425 [default0]:Skipping sample id=2740864. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2469429. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2737302. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2739473. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2471056. Maximum sequence length: 2049, sample length: 3095 [default0]:Skipping sample id=2470983. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2752948. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2739317. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2483747. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2717315. Maximum sequence length: 2049, sample length: 7105 [default0]:Skipping sample id=2754266. Maximum sequence length: 2049, sample length: 4795 [default0]:Skipping sample id=2493489. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2498828. Maximum sequence length: 2049, sample length: 2894 [default0]:Skipping sample id=2752630. Maximum sequence length: 2049, sample length: 3251 [default0]:Skipping sample id=2732854. Maximum sequence length: 2049, sample length: 5737 [default0]:Skipping sample id=2725461. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2495914. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2752952. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2714342. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2712148. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2483089. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2753000. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2478573. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2713607. Maximum sequence length: 2049, sample length: 2528 [default0]:Skipping sample id=2722483. Maximum sequence length: 2049, sample length: 2960 [default0]:Skipping sample id=2714208. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2732314. Maximum sequence length: 2049, sample length: 2826 [default0]:Skipping sample id=2484219. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2734013. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2488803. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2494046. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2713422. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2729058. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2470958. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2752045. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2477004. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2720926. Maximum sequence length: 2049, sample length: 3717 [default0]:Skipping sample id=2746436. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2729552. Maximum sequence length: 2049, sample length: 5312 [default0]:Skipping sample id=2736405. Maximum sequence length: 2049, sample length: 3867 [default0]:Skipping sample id=2742065. Maximum sequence length: 2049, sample length: 3049 [default0]:Skipping sample id=2731241. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2746664. Maximum sequence length: 2049, sample length: 3366 [default0]:Skipping sample id=2719402. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2480656. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2717179. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2745844. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2484655. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2732921. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2478371. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2484969. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2737561. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2753594. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2723375. Maximum sequence length: 2049, sample length: 6550 [default0]:Skipping sample id=2734247. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2477989. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2726469. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2496870. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2720468. Maximum sequence length: 2049, sample length: 3047 [default0]:Skipping sample id=2723954. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2719294. Maximum sequence length: 2049, sample length: 5322 [default0]:Skipping sample id=2754002. Maximum sequence length: 2049, sample length: 4824 [default0]:Skipping sample id=2726680. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2711675. Maximum sequence length: 2049, sample length: 2642 [default0]:Skipping sample id=2723966. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2494301. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2755302. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2732800. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2733111. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2714865. Maximum sequence length: 2049, sample length: 3880 [default0]:Skipping sample id=2754301. Maximum sequence length: 2049, sample length: 3282 [default0]:Skipping sample id=2746167. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2490054. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2739556. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2720444. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2734538. Maximum sequence length: 2049, sample length: 3596 [default0]:Skipping sample id=2756512. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2734138. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2732904. Maximum sequence length: 2049, sample length: 4122 [default0]:Skipping sample id=2711842. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2736170. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2717833. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2739175. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2728724. Maximum sequence length: 2049, sample length: 5023 [default0]:Skipping sample id=2732836. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2470754. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2746893. Maximum sequence length: 2049, sample length: 4237 [default0]:Skipping sample id=2742207. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2712683. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2719653. Maximum sequence length: 2049, sample length: 3560 [default0]:Skipping sample id=2736185. Maximum sequence length: 2049, sample length: 3259 [default0]:Skipping sample id=2717565. Maximum sequence length: 2049, sample length: 4548 [default0]:Skipping sample id=2725844. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2492238. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2748670. Maximum sequence length: 2049, sample length: 3968 [default0]:Skipping sample id=2714508. Maximum sequence length: 2049, sample length: 4426 [default0]:Skipping sample id=2715300. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2466337. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2493092. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2714517. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2721548. Maximum sequence length: 2049, sample length: 2879 [default0]:Skipping sample id=2744954. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2738020. Maximum sequence length: 2049, sample length: 3791 [default0]:Skipping sample id=2714420. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2732440. Maximum sequence length: 2049, sample length: 3265 [default0]:Skipping sample id=2721552. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2754625. Maximum sequence length: 2049, sample length: 2625 [default0]:Skipping sample id=2498779. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2753296. Maximum sequence length: 2049, sample length: 3685 [default0]:Skipping sample id=2749711. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2736560. Maximum sequence length: 2049, sample length: 3563 [default0]:Skipping sample id=2744441. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2721316. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2728381. Maximum sequence length: 2049, sample length: 4709 [default0]:Skipping sample id=2752783. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2496004. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2740852. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2727488. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2717132. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2499339. Maximum sequence length: 2049, sample length: 3102 [default0]:Skipping sample id=2732985. Maximum sequence length: 2049, sample length: 4761 [default0]:Skipping sample id=2746406. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2724419. Maximum sequence length: 2049, sample length: 2771 [default0]:Skipping sample id=2756990. Maximum sequence length: 2049, sample length: 2846 [default0]:Skipping sample id=2720059. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2731987. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2746834. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2748301. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2721729. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2735918. Maximum sequence length: 2049, sample length: 3809 [default0]:Skipping sample id=2745483. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2723272. Maximum sequence length: 2049, sample length: 3415 [default0]:Skipping sample id=2731217. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2744656. Maximum sequence length: 2049, sample length: 4559 [default0]:Skipping sample id=2754966. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2742296. Maximum sequence length: 2049, sample length: 5029 [default0]:Skipping sample id=2752684. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2737061. Maximum sequence length: 2049, sample length: 3602 [default0]:Skipping sample id=2726170. Maximum sequence length: 2049, sample length: 3003 [default0]:Skipping sample id=2735295. Maximum sequence length: 2049, sample length: 4793 [default0]:Skipping sample id=2749528. Maximum sequence length: 2049, sample length: 3726 [default0]:Skipping sample id=2727305. Maximum sequence length: 2049, sample length: 3871 [default0]:Skipping sample id=2484473. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2483511. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2755787. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2749687. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2738439. Maximum sequence length: 2049, sample length: 4276 [default0]:Skipping sample id=2471029. Maximum sequence length: 2049, sample length: 2910 [default0]:Skipping sample id=2714787. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2733540. Maximum sequence length: 2049, sample length: 2802 [default0]:Skipping sample id=2477795. Maximum sequence length: 2049, sample length: 2506 [default0]:Skipping sample id=2737930. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2752548. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2712638. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2479347. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2718984. Maximum sequence length: 2049, sample length: 2962 [default0]:Skipping sample id=2739569. Maximum sequence length: 2049, sample length: 3287 [default0]:Skipping sample id=2726678. Maximum sequence length: 2049, sample length: 3270 [default0]:Skipping sample id=2717619. Maximum sequence length: 2049, sample length: 3574 [default0]:Skipping sample id=2733475. Maximum sequence length: 2049, sample length: 3734 [default0]:Skipping sample id=2720809. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2755678. Maximum sequence length: 2049, sample length: 3901 [default0]:Skipping sample id=2756664. Maximum sequence length: 2049, sample length: 2907 [default0]:Skipping sample id=2752501. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2470670. Maximum sequence length: 2049, sample length: 2672 [default0]:Skipping sample id=2718175. Maximum sequence length: 2049, sample length: 4088 [default0]:Skipping sample id=2731046. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2732256. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2741390. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2725230. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2752943. Maximum sequence length: 2049, sample length: 4612 [default0]:Skipping sample id=2729831. Maximum sequence length: 2049, sample length: 3011 [default0]:Skipping sample id=2755626. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2751529. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2477173. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2730285. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2471002. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2716810. Maximum sequence length: 2049, sample length: 2506 [default0]:Skipping sample id=2733836. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2726535. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2713365. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2726374. Maximum sequence length: 2049, sample length: 4310 [default0]:Skipping sample id=2747320. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2714426. Maximum sequence length: 2049, sample length: 3531 [default0]:Skipping sample id=2737407. Maximum sequence length: 2049, sample length: 2486 [default0]:Skipping sample id=2731523. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2744249. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2745717. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2736148. Maximum sequence length: 2049, sample length: 2983 [default0]:Skipping sample id=2745893. Maximum sequence length: 2049, sample length: 5430 [default0]:Skipping sample id=2469791. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2746342. Maximum sequence length: 2049, sample length: 4042 [default0]:Skipping sample id=2731381. Maximum sequence length: 2049, sample length: 3567 [default0]:Skipping sample id=2716100. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2471231. Maximum sequence length: 2049, sample length: 3125 [default0]:Skipping sample id=2715517. Maximum sequence length: 2049, sample length: 5853 [default0]:Skipping sample id=2715585. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2730876. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2720411. Maximum sequence length: 2049, sample length: 2952 [default0]:Skipping sample id=2744111. Maximum sequence length: 2049, sample length: 4744 [default0]:Skipping sample id=2748555. Maximum sequence length: 2049, sample length: 4665 [default0]:Skipping sample id=2743595. Maximum sequence length: 2049, sample length: 3893 [default0]:Skipping sample id=2756801. Maximum sequence length: 2049, sample length: 4102 [default0]:Skipping sample id=2717351. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2746665. Maximum sequence length: 2049, sample length: 3624 [default0]:Skipping sample id=2480571. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2723578. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2744364. Maximum sequence length: 2049, sample length: 3756 [default0]:Skipping sample id=2745222. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2715102. Maximum sequence length: 2049, sample length: 4304 [default0]:Skipping sample id=2751948. Maximum sequence length: 2049, sample length: 4697 [default0]:Skipping sample id=2713455. Maximum sequence length: 2049, sample length: 5717 [default0]:Skipping sample id=2714776. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2724206. Maximum sequence length: 2049, sample length: 3875 [default0]:Skipping sample id=2736988. Maximum sequence length: 2049, sample length: 3623 [default0]:Skipping sample id=2717311. Maximum sequence length: 2049, sample length: 4090 [default0]:Skipping sample id=2719053. Maximum sequence length: 2049, sample length: 3505 [default0]:Skipping sample id=2499446. Maximum sequence length: 2049, sample length: 3393 [default0]:Skipping sample id=2727617. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2711606. Maximum sequence length: 2049, sample length: 3330 [default0]:Skipping sample id=2750031. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2743485. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2715264. Maximum sequence length: 2049, sample length: 3145 [default0]:Skipping sample id=2746769. Maximum sequence length: 2049, sample length: 3106 [default0]:Skipping sample id=2753066. Maximum sequence length: 2049, sample length: 7073 [default0]:Skipping sample id=2712597. Maximum sequence length: 2049, sample length: 3228 [default0]:Skipping sample id=2750270. Maximum sequence length: 2049, sample length: 2568 [default0]:Skipping sample id=2714877. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2482511. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2735808. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2719667. Maximum sequence length: 2049, sample length: 4775 [default0]:Skipping sample id=2722063. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2738506. Maximum sequence length: 2049, sample length: 3017 [default0]:Skipping sample id=2721567. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2497318. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2755476. Maximum sequence length: 2049, sample length: 2949 [default0]:Skipping sample id=2717743. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2715816. Maximum sequence length: 2049, sample length: 3151 [default0]:Skipping sample id=2717510. Maximum sequence length: 2049, sample length: 5332 [default0]:Skipping sample id=2721489. Maximum sequence length: 2049, sample length: 2808 [default0]:Skipping sample id=2711506. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2750256. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2747471. Maximum sequence length: 2049, sample length: 3603 [default0]:Skipping sample id=2711824. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2483975. Maximum sequence length: 2049, sample length: 3202 [default0]:Skipping sample id=2724459. Maximum sequence length: 2049, sample length: 4710 [default0]:Skipping sample id=2719549. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2485912. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2726035. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2739651. Maximum sequence length: 2049, sample length: 4002 [default0]:Skipping sample id=2730543. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2722605. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2465872. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2730013. Maximum sequence length: 2049, sample length: 3178 [default0]:Skipping sample id=2729644. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2724199. Maximum sequence length: 2049, sample length: 3733 [default0]:Skipping sample id=2712719. Maximum sequence length: 2049, sample length: 2748 [default0]:Skipping sample id=2739172. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2721308. Maximum sequence length: 2049, sample length: 2884 [default0]:Skipping sample id=2711267. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2489787. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2749631. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2750578. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2717994. Maximum sequence length: 2049, sample length: 3234 [default0]:Skipping sample id=2711230. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2742587. Maximum sequence length: 2049, sample length: 7102 [default0]:Skipping sample id=2493641. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2730834. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2733198. Maximum sequence length: 2049, sample length: 2584 [default0]:Skipping sample id=2711420. Maximum sequence length: 2049, sample length: 3416 [default0]:Skipping sample id=2722666. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2489104. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2715650. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2716880. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2727860. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2752524. Maximum sequence length: 2049, sample length: 3964 [default0]:Skipping sample id=2712648. Maximum sequence length: 2049, sample length: 2774 [default0]:Skipping sample id=2718172. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2712423. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2712223. Maximum sequence length: 2049, sample length: 5458 [default0]:Skipping sample id=2736409. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2731025. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2498676. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2742736. Maximum sequence length: 2049, sample length: 3157 [default0]:Skipping sample id=2729365. Maximum sequence length: 2049, sample length: 5413 [default0]:Skipping sample id=2751234. Maximum sequence length: 2049, sample length: 3489 [default0]:Skipping sample id=2477362. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2748587. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2755656. Maximum sequence length: 2049, sample length: 4203 [default0]:Skipping sample id=2482222. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2748482. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2751017. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2718017. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2736173. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2741888. Maximum sequence length: 2049, sample length: 3296 [default0]:Skipping sample id=2712294. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2713757. Maximum sequence length: 2049, sample length: 5583 [default0]:Skipping sample id=2727300. Maximum sequence length: 2049, sample length: 3015 [default0]:Skipping sample id=2745556. Maximum sequence length: 2049, sample length: 3506 [default0]:Skipping sample id=2755466. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2716858. Maximum sequence length: 2049, sample length: 3507 [default0]:Skipping sample id=2741414. Maximum sequence length: 2049, sample length: 4095 [default0]:Skipping sample id=2726232. Maximum sequence length: 2049, sample length: 2884 [default0]:Skipping sample id=2481674. Maximum sequence length: 2049, sample length: 2753 [default0]:Skipping sample id=2746186. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2740332. Maximum sequence length: 2049, sample length: 3115 [default0]:Skipping sample id=2755813. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2713938. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2732378. Maximum sequence length: 2049, sample length: 3603 [default0]:Skipping sample id=2746183. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2716821. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2721896. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2751917. Maximum sequence length: 2049, sample length: 3882 [default0]:Skipping sample id=2753963. Maximum sequence length: 2049, sample length: 2615 [default0]:Skipping sample id=2737409. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2495008. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2735124. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2753639. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2731563. Maximum sequence length: 2049, sample length: 2839 [default0]:Skipping sample id=2494391. Maximum sequence length: 2049, sample length: 3032 [default0]:Skipping sample id=2748612. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2746307. Maximum sequence length: 2049, sample length: 4868 [default0]:Skipping sample id=2736377. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2751687. Maximum sequence length: 2049, sample length: 4317 [default0]:Skipping sample id=2488939. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2743332. Maximum sequence length: 2049, sample length: 4098 [default0]:Skipping sample id=2748492. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2738872. Maximum sequence length: 2049, sample length: 2640 [default0]:Skipping sample id=2724687. Maximum sequence length: 2049, sample length: 6472 [default0]:Skipping sample id=2720664. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2711424. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2751600. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2729685. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2493503. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2721303. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2738473. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2752091. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2737127. Maximum sequence length: 2049, sample length: 3298 [default0]:Skipping sample id=2725249. Maximum sequence length: 2049, sample length: 3242 [default0]:Skipping sample id=2712628. Maximum sequence length: 2049, sample length: 5357 [default0]:Skipping sample id=2718194. Maximum sequence length: 2049, sample length: 2892 [default0]:Skipping sample id=2722387. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2756055. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2720669. Maximum sequence length: 2049, sample length: 4072 [default0]:Skipping sample id=2735035. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2720906. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2738592. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2752666. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2482986. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2721691. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2466254. Maximum sequence length: 2049, sample length: 3248 [default0]:Skipping sample id=2729548. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2734467. Maximum sequence length: 2049, sample length: 4579 [default0]:Skipping sample id=2751648. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2489424. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2754655. Maximum sequence length: 2049, sample length: 2533 [default0]:Skipping sample id=2730032. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2718432. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2737523. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2739367. Maximum sequence length: 2049, sample length: 4438 [default0]:Skipping sample id=2725504. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2740233. Maximum sequence length: 2049, sample length: 3099 [default0]:Skipping sample id=2478449. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2727981. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2756120. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2718796. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2476990. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2746640. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2720512. Maximum sequence length: 2049, sample length: 3595 [default0]:Skipping sample id=2743004. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2736307. Maximum sequence length: 2049, sample length: 3215 [default0]:Skipping sample id=2735674. Maximum sequence length: 2049, sample length: 5974 [default0]:Skipping sample id=2751363. Maximum sequence length: 2049, sample length: 4087 [default0]:Skipping sample id=2719768. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2740424. Maximum sequence length: 2049, sample length: 4534 [default0]:Skipping sample id=2719210. Maximum sequence length: 2049, sample length: 3259 [default0]:Skipping sample id=2467846. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2730033. Maximum sequence length: 2049, sample length: 4012 [default0]:Skipping sample id=2713593. Maximum sequence length: 2049, sample length: 2747 [default0]:Skipping sample id=2714486. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2467885. Maximum sequence length: 2049, sample length: 3450 [default0]:Skipping sample id=2756734. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2728434. Maximum sequence length: 2049, sample length: 4172 [default0]:Skipping sample id=2713346. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2711196. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2478492. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2719010. Maximum sequence length: 2049, sample length: 2557 [default0]:Skipping sample id=2739307. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2742372. Maximum sequence length: 2049, sample length: 3246 [default0]:Skipping sample id=2725730. Maximum sequence length: 2049, sample length: 4508 [default0]:Skipping sample id=2732688. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2755966. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2735942. Maximum sequence length: 2049, sample length: 4147 [default0]:Skipping sample id=2716136. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2741653. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2746162. Maximum sequence length: 2049, sample length: 6399 [default0]:Skipping sample id=2731456. Maximum sequence length: 2049, sample length: 5047 [default0]:Skipping sample id=2481622. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2735795. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2466550. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2736921. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2742740. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2732432. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2729427. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2747662. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2489458. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2719068. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2733097. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2498086. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2730296. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2722430. Maximum sequence length: 2049, sample length: 3156 [default0]:Skipping sample id=2717219. Maximum sequence length: 2049, sample length: 2919 [default0]:Skipping sample id=2486842. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2755043. Maximum sequence length: 2049, sample length: 4254 [default0]:Skipping sample id=2713283. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2755023. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2477704. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2756927. Maximum sequence length: 2049, sample length: 3107 [default0]:Skipping sample id=2747194. Maximum sequence length: 2049, sample length: 3056 [default0]:Skipping sample id=2755729. Maximum sequence length: 2049, sample length: 4363 [default0]:Skipping sample id=2740120. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2730982. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2737488. Maximum sequence length: 2049, sample length: 3685 [default0]:Skipping sample id=2484428. Maximum sequence length: 2049, sample length: 2513 [default0]:Skipping sample id=2752554. Maximum sequence length: 2049, sample length: 5033 [default0]:Skipping sample id=2481949. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2746131. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2739266. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2756992. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2722997. Maximum sequence length: 2049, sample length: 6309 [default0]:Skipping sample id=2747030. Maximum sequence length: 2049, sample length: 3195 [default0]:Skipping sample id=2728265. Maximum sequence length: 2049, sample length: 3093 [default0]:Skipping sample id=2719579. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2755924. Maximum sequence length: 2049, sample length: 3509 [default0]:Skipping sample id=2722494. Maximum sequence length: 2049, sample length: 3605 [default0]:Skipping sample id=2725597. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2749908. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2741307. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2711078. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2720845. Maximum sequence length: 2049, sample length: 3404 [default0]:Skipping sample id=2747257. Maximum sequence length: 2049, sample length: 3660 [default0]:Skipping sample id=2722067. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2751492. Maximum sequence length: 2049, sample length: 4532 [default0]:Skipping sample id=2711237. Maximum sequence length: 2049, sample length: 3297 [default0]:Skipping sample id=2724095. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2716651. Maximum sequence length: 2049, sample length: 5194 [default0]:Skipping sample id=2745391. Maximum sequence length: 2049, sample length: 6480 [default0]:Skipping sample id=2714474. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2487713. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2744015. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2744482. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2713874. Maximum sequence length: 2049, sample length: 4645 [default0]:Skipping sample id=2731240. Maximum sequence length: 2049, sample length: 3997 [default0]:Skipping sample id=2729118. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2753603. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2714857. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2477760. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2750041. Maximum sequence length: 2049, sample length: 2587 [default0]:Skipping sample id=2724209. Maximum sequence length: 2049, sample length: 4596 [default0]:Skipping sample id=2730780. Maximum sequence length: 2049, sample length: 2824 [default0]:Skipping sample id=2756479. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2733797. Maximum sequence length: 2049, sample length: 3903 [default0]:Skipping sample id=2735958. Maximum sequence length: 2049, sample length: 3569 [default0]:Skipping sample id=2466142. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2737590. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2747708. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2477742. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2741772. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2719135. Maximum sequence length: 2049, sample length: 4148 [default0]:Skipping sample id=2750533. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2726392. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2471237. Maximum sequence length: 2049, sample length: 2590 [default0]:Skipping sample id=2739482. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2715041. Maximum sequence length: 2049, sample length: 5231 [default0]:Skipping sample id=2744993. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2717241. Maximum sequence length: 2049, sample length: 5143 [default0]:Skipping sample id=2742293. Maximum sequence length: 2049, sample length: 3361 [default0]:Skipping sample id=2723914. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2729774. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2748745. Maximum sequence length: 2049, sample length: 4704 [default0]:Skipping sample id=2743398. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2731878. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2492693. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2726913. Maximum sequence length: 2049, sample length: 4673 [default0]:Skipping sample id=2747259. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2750675. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2712331. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2749401. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2718621. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2717470. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2749881. Maximum sequence length: 2049, sample length: 5152 [default0]:Skipping sample id=2741085. Maximum sequence length: 2049, sample length: 3678 [default0]:Skipping sample id=2493530. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2728764. Maximum sequence length: 2049, sample length: 3990 [default0]:Skipping sample id=2714803. Maximum sequence length: 2049, sample length: 2703 [default0]:Skipping sample id=2487651. Maximum sequence length: 2049, sample length: 2642 [default0]:Skipping sample id=2729536. Maximum sequence length: 2049, sample length: 3935 [default0]:Skipping sample id=2471049. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2728667. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2714252. Maximum sequence length: 2049, sample length: 4537 [default0]:Skipping sample id=2741592. Maximum sequence length: 2049, sample length: 4206 [default0]:Skipping sample id=2747392. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2747617. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2738297. Maximum sequence length: 2049, sample length: 4657 [default0]:Skipping sample id=2728603. Maximum sequence length: 2049, sample length: 3458 [default0]:Skipping sample id=2733103. Maximum sequence length: 2049, sample length: 2470 [default0]:Skipping sample id=2734903. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2755511. Maximum sequence length: 2049, sample length: 2933 [default0]:Skipping sample id=2730671. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2757038. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2738755. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2727698. Maximum sequence length: 2049, sample length: 2969 [default0]:Skipping sample id=2731166. Maximum sequence length: 2049, sample length: 3296 [default0]:Skipping sample id=2745809. Maximum sequence length: 2049, sample length: 3118 [default0]:Skipping sample id=2719337. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2465818. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2469274. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2745826. Maximum sequence length: 2049, sample length: 3502 [default0]:Skipping sample id=2737227. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2729904. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2723764. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2471016. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2749064. Maximum sequence length: 2049, sample length: 3553 [default0]:Skipping sample id=2467257. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2712709. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2477701. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2745366. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2734401. Maximum sequence length: 2049, sample length: 4142 [default0]:Skipping sample id=2751257. Maximum sequence length: 2049, sample length: 2639 [default0]:Skipping sample id=2740710. Maximum sequence length: 2049, sample length: 3713 [default0]:Skipping sample id=2744897. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2755326. Maximum sequence length: 2049, sample length: 4246 [default0]:Skipping sample id=2740337. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2728924. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2715441. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2498358. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2739338. Maximum sequence length: 2049, sample length: 4799 [default0]:Skipping sample id=2751907. Maximum sequence length: 2049, sample length: 4077 [default0]:Skipping sample id=2721294. Maximum sequence length: 2049, sample length: 2980 [default0]:Skipping sample id=2715424. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2747130. Maximum sequence length: 2049, sample length: 3745 [default0]:Skipping sample id=2734123. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2721826. Maximum sequence length: 2049, sample length: 3877 [default0]:Skipping sample id=2721505. Maximum sequence length: 2049, sample length: 4834 [default0]:Skipping sample id=2739480. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2485561. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2751937. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2753619. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2486345. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2748467. Maximum sequence length: 2049, sample length: 3019 [default0]:Skipping sample id=2737876. Maximum sequence length: 2049, sample length: 5325 [default0]:Skipping sample id=2716104. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2756868. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2753505. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2724276. Maximum sequence length: 2049, sample length: 3223 [default0]:Skipping sample id=2729336. Maximum sequence length: 2049, sample length: 6007 [default0]:Skipping sample id=2715912. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2479037. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2489055. Maximum sequence length: 2049, sample length: 2477 [default0]:Skipping sample id=2470821. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2750593. Maximum sequence length: 2049, sample length: 3740 [default0]:Skipping sample id=2726545. Maximum sequence length: 2049, sample length: 4900 [default0]:Skipping sample id=2740995. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2755525. Maximum sequence length: 2049, sample length: 2672 [default0]:Skipping sample id=2742007. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2717563. Maximum sequence length: 2049, sample length: 3290 [default0]:Skipping sample id=2725196. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2718634. Maximum sequence length: 2049, sample length: 3270 [default0]:Skipping sample id=2729724. Maximum sequence length: 2049, sample length: 5544 [default0]:Skipping sample id=2730129. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2748247. Maximum sequence length: 2049, sample length: 4694 [default0]:Skipping sample id=2489615. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2726187. Maximum sequence length: 2049, sample length: 4995 [default0]:Skipping sample id=2737147. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2712711. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2748895. Maximum sequence length: 2049, sample length: 5154 [default0]:Skipping sample id=2750317. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2482984. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2730703. Maximum sequence length: 2049, sample length: 4172 [default0]:Skipping sample id=2485655. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2480992. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2719532. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2479940. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2477855. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2729252. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2718176. Maximum sequence length: 2049, sample length: 4182 [default0]:Skipping sample id=2733711. Maximum sequence length: 2049, sample length: 4618 [default0]:Skipping sample id=2745420. Maximum sequence length: 2049, sample length: 3895 [default0]:Skipping sample id=2746084. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2470138. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2712704. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2742712. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2717341. Maximum sequence length: 2049, sample length: 2878 [default0]:Skipping sample id=2714608. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2738946. Maximum sequence length: 2049, sample length: 2788 [default0]:Skipping sample id=2471291. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2729346. Maximum sequence length: 2049, sample length: 3211 [default0]:Skipping sample id=2751245. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2748703. Maximum sequence length: 2049, sample length: 3091 [default0]:Skipping sample id=2713447. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2727606. Maximum sequence length: 2049, sample length: 2992 [default0]:Skipping sample id=2755406. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2488401. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2719821. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2752131. Maximum sequence length: 2049, sample length: 2904 [default0]:Skipping sample id=2726489. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2718808. Maximum sequence length: 2049, sample length: 3146 [default0]:Skipping sample id=2744923. Maximum sequence length: 2049, sample length: 3702 [default0]:Skipping sample id=2731054. Maximum sequence length: 2049, sample length: 4218 [default0]:Skipping sample id=2755509. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2739151. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2498243. Maximum sequence length: 2049, sample length: 4282 [default0]:Skipping sample id=2715348. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2478074. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2742773. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2725410. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2712342. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2487459. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2738623. Maximum sequence length: 2049, sample length: 3803 [default0]:Skipping sample id=2727583. Maximum sequence length: 2049, sample length: 3061 [default0]:Skipping sample id=2746538. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2718544. Maximum sequence length: 2049, sample length: 3235 [default0]:Skipping sample id=2495686. Maximum sequence length: 2049, sample length: 3514 [default0]:Skipping sample id=2724967. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2751953. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2717344. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2738713. Maximum sequence length: 2049, sample length: 3820 [default0]:Skipping sample id=2750567. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2720375. Maximum sequence length: 2049, sample length: 2565 [default0]:Skipping sample id=2725135. Maximum sequence length: 2049, sample length: 2687 [default0]:Skipping sample id=2743062. Maximum sequence length: 2049, sample length: 2822 [default0]:Skipping sample id=2734438. Maximum sequence length: 2049, sample length: 2507 [default0]:Skipping sample id=2755835. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2712922. Maximum sequence length: 2049, sample length: 4183 [default0]:Skipping sample id=2477707. Maximum sequence length: 2049, sample length: 2822 [default0]:Skipping sample id=2715746. Maximum sequence length: 2049, sample length: 3228 [default0]:Skipping sample id=2744251. Maximum sequence length: 2049, sample length: 3805 [default0]:Skipping sample id=2721824. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2721019. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2737966. Maximum sequence length: 2049, sample length: 6255 [default0]:Skipping sample id=2738806. Maximum sequence length: 2049, sample length: 3545 [default0]:Skipping sample id=2751935. Maximum sequence length: 2049, sample length: 3914 [default0]:Skipping sample id=2715745. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2744725. Maximum sequence length: 2049, sample length: 4421 [default0]:Skipping sample id=2482657. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2723581. Maximum sequence length: 2049, sample length: 6335 [default0]:Skipping sample id=2478866. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2751437. Maximum sequence length: 2049, sample length: 3772 [default0]:Skipping sample id=2498863. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2753115. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2715392. Maximum sequence length: 2049, sample length: 3467 [default0]:Skipping sample id=2752772. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2716611. Maximum sequence length: 2049, sample length: 3462 [default0]:Skipping sample id=2738905. Maximum sequence length: 2049, sample length: 3357 [default0]:Skipping sample id=2750420. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2742482. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2756911. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2721902. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2726788. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2728793. Maximum sequence length: 2049, sample length: 2987 [default0]:Skipping sample id=2478063. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2750864. Maximum sequence length: 2049, sample length: 4600 [default0]:Skipping sample id=2729262. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2721589. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2721344. Maximum sequence length: 2049, sample length: 3585 [default0]:Skipping sample id=2483400. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2728628. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2742342. Maximum sequence length: 2049, sample length: 3596 [default0]:Skipping sample id=2728889. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2723334. Maximum sequence length: 2049, sample length: 4875 [default0]:Skipping sample id=2471076. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2738269. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2725051. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2738533. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2732134. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2753429. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2719986. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2718758. Maximum sequence length: 2049, sample length: 4502 [default0]:Skipping sample id=2717002. Maximum sequence length: 2049, sample length: 3020 [default0]:Skipping sample id=2714464. Maximum sequence length: 2049, sample length: 3353 [default0]:Skipping sample id=2716195. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2496080. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2750760. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2736050. Maximum sequence length: 2049, sample length: 3917 [default0]:Skipping sample id=2715164. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2747429. Maximum sequence length: 2049, sample length: 4508 [default0]:Skipping sample id=2720413. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2740794. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2732866. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2713267. Maximum sequence length: 2049, sample length: 4960 [default0]:Skipping sample id=2481597. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2741429. Maximum sequence length: 2049, sample length: 4309 [default0]:Skipping sample id=2749438. Maximum sequence length: 2049, sample length: 4121 [default0]:Skipping sample id=2742023. Maximum sequence length: 2049, sample length: 4998 [default0]:Skipping sample id=2488128. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2739197. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2744492. Maximum sequence length: 2049, sample length: 4495 [default0]:Skipping sample id=2491350. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2721635. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2749791. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2466328. Maximum sequence length: 2049, sample length: 2234 [default0]:Skipping sample id=2725350. Maximum sequence length: 2049, sample length: 6272 [default0]:Skipping sample id=2716942. Maximum sequence length: 2049, sample length: 7283 [default0]:Skipping sample id=2741488. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2714479. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2712402. Maximum sequence length: 2049, sample length: 3060 [default0]:Skipping sample id=2745831. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2726403. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2720858. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2748287. Maximum sequence length: 2049, sample length: 3641 [default0]:Skipping sample id=2725904. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2753041. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2712196. Maximum sequence length: 2049, sample length: 3071 [default0]:Skipping sample id=2713065. Maximum sequence length: 2049, sample length: 2892 [default0]:Skipping sample id=2715213. Maximum sequence length: 2049, sample length: 3833 [default0]:Skipping sample id=2718287. Maximum sequence length: 2049, sample length: 2985 [default0]:Skipping sample id=2735596. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2756233. Maximum sequence length: 2049, sample length: 3916 [default0]:Skipping sample id=2738563. Maximum sequence length: 2049, sample length: 4771 [default0]:Skipping sample id=2716983. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2756092. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2727893. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2729835. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2740608. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2738914. Maximum sequence length: 2049, sample length: 3275 [default0]:Skipping sample id=2735886. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2720278. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2737357. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2735911. Maximum sequence length: 2049, sample length: 5532 [default0]:Skipping sample id=2719230. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2719622. Maximum sequence length: 2049, sample length: 2855 [default0]:Skipping sample id=2754537. Maximum sequence length: 2049, sample length: 3147 [default0]:Skipping sample id=2482397. Maximum sequence length: 2049, sample length: 2406 [default0]:Skipping sample id=2722552. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2726028. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2749510. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2483889. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2736430. Maximum sequence length: 2049, sample length: 3687 [default0]:Skipping sample id=2489986. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2735040. Maximum sequence length: 2049, sample length: 3550 [default0]:Skipping sample id=2486623. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2755733. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2742476. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2747555. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2742105. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2722663. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2471131. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2735033. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2718107. Maximum sequence length: 2049, sample length: 3915 [default0]:Skipping sample id=2747475. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2730321. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2739533. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2736420. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2720556. Maximum sequence length: 2049, sample length: 4448 [default0]:Skipping sample id=2737508. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2489501. Maximum sequence length: 2049, sample length: 3583 [default0]:Skipping sample id=2497228. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2740423. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2740059. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2718495. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2480452. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2752437. Maximum sequence length: 2049, sample length: 5835 [default0]:Skipping sample id=2732537. Maximum sequence length: 2049, sample length: 3699 [default0]:Skipping sample id=2751148. Maximum sequence length: 2049, sample length: 3620 [default0]:Skipping sample id=2730404. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2719652. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2742979. Maximum sequence length: 2049, sample length: 4594 [default0]:Skipping sample id=2718832. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2712794. Maximum sequence length: 2049, sample length: 4542 [default0]:Skipping sample id=2715744. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2718723. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2750653. Maximum sequence length: 2049, sample length: 3646 [default0]:Skipping sample id=2724897. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2748245. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2736721. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2748337. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2715261. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2749559. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2756449. Maximum sequence length: 2049, sample length: 4556 [default0]:Skipping sample id=2747955. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2733882. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2749607. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2723573. Maximum sequence length: 2049, sample length: 3171 [default0]:Skipping sample id=2497929. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2718310. Maximum sequence length: 2049, sample length: 4416 [default0]:Skipping sample id=2486097. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2740093. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2487412. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2717201. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2728117. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2724022. Maximum sequence length: 2049, sample length: 5773 [default0]:Skipping sample id=2746951. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2746280. Maximum sequence length: 2049, sample length: 2978 [default0]:Skipping sample id=2478353. Maximum sequence length: 2049, sample length: 3197 [default0]:Skipping sample id=2733796. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2756293. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2744041. Maximum sequence length: 2049, sample length: 4771 [default0]:Skipping sample id=2715296. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2717645. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2745372. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2718669. Maximum sequence length: 2049, sample length: 2992 [default0]:Skipping sample id=2744893. Maximum sequence length: 2049, sample length: 8121 [default0]:Skipping sample id=2714019. Maximum sequence length: 2049, sample length: 3970 [default0]:Skipping sample id=2730465. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2716362. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2745621. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2728761. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2487321. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2740395. Maximum sequence length: 2049, sample length: 3639 [default0]:Skipping sample id=2720330. Maximum sequence length: 2049, sample length: 4184 [default0]:Skipping sample id=2720197. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2748461. Maximum sequence length: 2049, sample length: 4105 [default0]:Skipping sample id=2724155. Maximum sequence length: 2049, sample length: 2589 [default0]:Skipping sample id=2479310. Maximum sequence length: 2049, sample length: 3842 [default0]:Skipping sample id=2713745. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2482061. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2733917. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2737740. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2745189. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2730605. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2713595. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2741002. Maximum sequence length: 2049, sample length: 3202 [default0]:Skipping sample id=2732018. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2747871. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2719804. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2484659. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2751442. Maximum sequence length: 2049, sample length: 3649 [default0]:Skipping sample id=2491166. Maximum sequence length: 2049, sample length: 2939 [default0]:Skipping sample id=2736352. Maximum sequence length: 2049, sample length: 3646 [default0]:Skipping sample id=2732246. Maximum sequence length: 2049, sample length: 5302 [default0]:Skipping sample id=2727663. Maximum sequence length: 2049, sample length: 2951 [default0]:Skipping sample id=2737115. Maximum sequence length: 2049, sample length: 4065 [default0]:Skipping sample id=2752996. Maximum sequence length: 2049, sample length: 3952 [default0]:Skipping sample id=2733310. Maximum sequence length: 2049, sample length: 4532 [default0]:Skipping sample id=2719703. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2745408. Maximum sequence length: 2049, sample length: 2829 [default0]:Skipping sample id=2723978. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2742802. Maximum sequence length: 2049, sample length: 3056 [default0]:Skipping sample id=2749710. Maximum sequence length: 2049, sample length: 4069 [default0]:Skipping sample id=2722794. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2731903. Maximum sequence length: 2049, sample length: 2592 [default0]:Skipping sample id=2736812. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2723347. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2719003. Maximum sequence length: 2049, sample length: 4645 [default0]:Skipping sample id=2725246. Maximum sequence length: 2049, sample length: 2889 [default0]:Skipping sample id=2727947. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2721055. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2716119. Maximum sequence length: 2049, sample length: 4345 [default0]:Skipping sample id=2723989. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2747687. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2715049. Maximum sequence length: 2049, sample length: 3835 [default0]:Skipping sample id=2729080. Maximum sequence length: 2049, sample length: 4971 [default0]:Skipping sample id=2718480. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2756676. Maximum sequence length: 2049, sample length: 3231 [default0]:Skipping sample id=2737205. Maximum sequence length: 2049, sample length: 4836 [default0]:Skipping sample id=2717868. Maximum sequence length: 2049, sample length: 6302 [default0]:Skipping sample id=2753937. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2732860. Maximum sequence length: 2049, sample length: 4909 [default0]:Skipping sample id=2739637. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2730446. Maximum sequence length: 2049, sample length: 5077 [default0]:Skipping sample id=2730603. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2747764. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2499067. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2728137. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2753181. Maximum sequence length: 2049, sample length: 5210 [default0]:Skipping sample id=2713662. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2740863. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2467071. Maximum sequence length: 2049, sample length: 2310 [default0]:Skipping sample id=2719668. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2751456. Maximum sequence length: 2049, sample length: 3381 [default0]:Skipping sample id=2726513. Maximum sequence length: 2049, sample length: 5928 [default0]:Skipping sample id=2754378. Maximum sequence length: 2049, sample length: 4225 [default0]:Skipping sample id=2753421. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2717074. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2481302. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2730800. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2470786. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2721988. Maximum sequence length: 2049, sample length: 5319 [default0]:Skipping sample id=2734980. Maximum sequence length: 2049, sample length: 3343 [default0]:Skipping sample id=2719060. Maximum sequence length: 2049, sample length: 3518 [default0]:Skipping sample id=2736601. Maximum sequence length: 2049, sample length: 3191 [default0]:Skipping sample id=2756332. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2722659. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2711289. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2742561. Maximum sequence length: 2049, sample length: 6760 [default0]:Skipping sample id=2714257. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2721601. Maximum sequence length: 2049, sample length: 3852 [default0]:Skipping sample id=2734080. Maximum sequence length: 2049, sample length: 4191 [default0]:Skipping sample id=2480362. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2720089. Maximum sequence length: 2049, sample length: 3634 [default0]:Skipping sample id=2723717. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2743224. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2729122. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2715533. Maximum sequence length: 2049, sample length: 5146 [default0]:Skipping sample id=2717723. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2732035. Maximum sequence length: 2049, sample length: 3534 [default0]:Skipping sample id=2754259. Maximum sequence length: 2049, sample length: 2646 [default0]:Skipping sample id=2726911. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2729224. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2741948. Maximum sequence length: 2049, sample length: 3519 [default0]:Skipping sample id=2725573. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2723608. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2718989. Maximum sequence length: 2049, sample length: 5373 [default0]:Skipping sample id=2723025. Maximum sequence length: 2049, sample length: 4165 [default0]:Skipping sample id=2750291. Maximum sequence length: 2049, sample length: 3045 [default0]:Skipping sample id=2745285. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2716124. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2735809. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2734849. Maximum sequence length: 2049, sample length: 3326 [default0]:Skipping sample id=2745584. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2731818. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2730045. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2729298. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2738636. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2725791. Maximum sequence length: 2049, sample length: 5153 [default0]:Skipping sample id=2749137. Maximum sequence length: 2049, sample length: 4796 [default0]:Skipping sample id=2727429. Maximum sequence length: 2049, sample length: 3836 [default0]:Skipping sample id=2712525. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2755193. Maximum sequence length: 2049, sample length: 2709 [default0]:Skipping sample id=2725537. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2733866. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2711598. Maximum sequence length: 2049, sample length: 3479 [default0]:Skipping sample id=2492524. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2494982. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2490012. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2723023. Maximum sequence length: 2049, sample length: 3644 [default0]:Skipping sample id=2713913. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2487620. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2490183. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2756571. Maximum sequence length: 2049, sample length: 3647 [default0]:Skipping sample id=2733021. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2742953. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2744092. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2737116. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2754688. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2725642. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2751696. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2724189. Maximum sequence length: 2049, sample length: 3525 [default0]:Skipping sample id=2729849. Maximum sequence length: 2049, sample length: 5155 [default0]:Skipping sample id=2728980. Maximum sequence length: 2049, sample length: 2737 [default0]:Skipping sample id=2719027. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2733503. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2717634. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2716571. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2711281. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2493795. Maximum sequence length: 2049, sample length: 2406 [default0]:Skipping sample id=2752614. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2730091. Maximum sequence length: 2049, sample length: 3946 [default0]:Skipping sample id=2738371. Maximum sequence length: 2049, sample length: 6099 [default0]:Skipping sample id=2741569. Maximum sequence length: 2049, sample length: 3096 [default0]:Skipping sample id=2720897. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2735672. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2712855. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2730768. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2749883. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2730315. Maximum sequence length: 2049, sample length: 2846 [default0]:Skipping sample id=2712375. Maximum sequence length: 2049, sample length: 3996 [default0]:Skipping sample id=2755292. Maximum sequence length: 2049, sample length: 3413 [default0]:Skipping sample id=2714976. Maximum sequence length: 2049, sample length: 4635 [default0]:Skipping sample id=2743673. Maximum sequence length: 2049, sample length: 4199 [default0]:Skipping sample id=2733358. Maximum sequence length: 2049, sample length: 4899 [default0]:Skipping sample id=2719714. Maximum sequence length: 2049, sample length: 2999 [default0]:Skipping sample id=2742783. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2730727. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2720075. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2754080. Maximum sequence length: 2049, sample length: 3032 [default0]:Skipping sample id=2713865. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2497860. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2728071. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2745791. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2484485. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2742421. Maximum sequence length: 2049, sample length: 2748 [default0]:Skipping sample id=2756124. Maximum sequence length: 2049, sample length: 6146 [default0]:Skipping sample id=2739169. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2741494. Maximum sequence length: 2049, sample length: 2707 [default0]:Skipping sample id=2724726. Maximum sequence length: 2049, sample length: 3658 [default0]:Skipping sample id=2717229. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2713227. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2480038. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2466518. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2486872. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2746788. Maximum sequence length: 2049, sample length: 3875 [default0]:Skipping sample id=2746424. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2751264. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2479376. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2747009. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2722291. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2487114. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2752006. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2735945. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2479467. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2486614. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2753015. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2750444. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2740284. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2753789. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2714739. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2735450. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2725668. Maximum sequence length: 2049, sample length: 2975 [default0]:Skipping sample id=2739717. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2747134. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2733085. Maximum sequence length: 2049, sample length: 3591 [default0]:Skipping sample id=2729018. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2712473. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2743817. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2739757. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2469031. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2724371. Maximum sequence length: 2049, sample length: 2865 [default0]:Skipping sample id=2711086. Maximum sequence length: 2049, sample length: 3578 [default0]:Skipping sample id=2493559. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2726876. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2716667. Maximum sequence length: 2049, sample length: 3643 [default0]:Skipping sample id=2755555. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2481063. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2720490. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2744345. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2721160. Maximum sequence length: 2049, sample length: 4268 [default0]:Skipping sample id=2746985. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2740535. Maximum sequence length: 2049, sample length: 2888 [default0]:Skipping sample id=2711995. Maximum sequence length: 2049, sample length: 3674 [default0]:Skipping sample id=2467451. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2468252. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2741624. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2753376. Maximum sequence length: 2049, sample length: 4692 [default0]:Skipping sample id=2720155. Maximum sequence length: 2049, sample length: 3588 [default0]:Skipping sample id=2753189. Maximum sequence length: 2049, sample length: 3710 [default0]:Skipping sample id=2729156. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2729081. Maximum sequence length: 2049, sample length: 3022 [default0]:Skipping sample id=2489779. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2484347. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2746906. Maximum sequence length: 2049, sample length: 3560 [default0]:Skipping sample id=2725780. Maximum sequence length: 2049, sample length: 4152 [default0]:Skipping sample id=2737420. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2720768. Maximum sequence length: 2049, sample length: 3000 [default0]:Skipping sample id=2730278. Maximum sequence length: 2049, sample length: 4769 [default0]:Skipping sample id=2756397. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2733206. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2726336. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2733957. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2468197. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2726815. Maximum sequence length: 2049, sample length: 3279 [default0]:Skipping sample id=2755540. Maximum sequence length: 2049, sample length: 4294 [default0]:Skipping sample id=2715994. Maximum sequence length: 2049, sample length: 3577 [default0]:Skipping sample id=2721536. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2736591. Maximum sequence length: 2049, sample length: 3147 [default0]:Skipping sample id=2721135. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2725407. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2482017. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2750611. Maximum sequence length: 2049, sample length: 2471 [default0]:Skipping sample id=2726111. Maximum sequence length: 2049, sample length: 4014 [default0]:Skipping sample id=2722969. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2483602. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2479973. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2721178. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2749287. Maximum sequence length: 2049, sample length: 4409 [default0]:Skipping sample id=2733817. Maximum sequence length: 2049, sample length: 2924 [default0]:Skipping sample id=2713687. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2736463. Maximum sequence length: 2049, sample length: 4303 [default0]:Skipping sample id=2754603. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2744006. Maximum sequence length: 2049, sample length: 2728 [default0]:Skipping sample id=2745116. Maximum sequence length: 2049, sample length: 3401 [default0]:Skipping sample id=2468696. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2738870. Maximum sequence length: 2049, sample length: 4786 [default0]:Skipping sample id=2714058. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2739393. Maximum sequence length: 2049, sample length: 2430 [default0]:Skipping sample id=2721104. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2499058. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2738095. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2478672. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2719701. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2714179. Maximum sequence length: 2049, sample length: 3421 [default0]:Skipping sample id=2751739. Maximum sequence length: 2049, sample length: 3546 [default0]:Skipping sample id=2735097. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2711356. Maximum sequence length: 2049, sample length: 2861 [default0]:Skipping sample id=2719565. Maximum sequence length: 2049, sample length: 4719 [default0]:Skipping sample id=2482551. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2737044. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2729112. Maximum sequence length: 2049, sample length: 3677 [default0]:Skipping sample id=2717158. Maximum sequence length: 2049, sample length: 3405 [default0]:Skipping sample id=2752925. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2754826. Maximum sequence length: 2049, sample length: 2234 [default0]:Skipping sample id=2487087. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2731282. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2729008. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2718265. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2714062. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2738203. Maximum sequence length: 2049, sample length: 4095 [default0]:Skipping sample id=2727191. Maximum sequence length: 2049, sample length: 4156 [default0]:Skipping sample id=2468405. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2488260. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2754345. Maximum sequence length: 2049, sample length: 3496 [default0]:Skipping sample id=2711615. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2484943. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2717530. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2727439. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2736046. Maximum sequence length: 2049, sample length: 3307 [default0]:Skipping sample id=2755630. Maximum sequence length: 2049, sample length: 2758 [default0]:Skipping sample id=2731532. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2732780. Maximum sequence length: 2049, sample length: 3939 [default0]:Skipping sample id=2479902. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2741833. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2720982. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2722253. Maximum sequence length: 2049, sample length: 3681 [default0]:Skipping sample id=2717016. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2729477. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2727598. Maximum sequence length: 2049, sample length: 5336 [default0]:Skipping sample id=2749512. Maximum sequence length: 2049, sample length: 3273 [default0]:Skipping sample id=2749660. Maximum sequence length: 2049, sample length: 5450 [default0]:Skipping sample id=2739825. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2741571. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2726541. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2752695. Maximum sequence length: 2049, sample length: 3998 [default0]:Skipping sample id=2730747. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2739384. Maximum sequence length: 2049, sample length: 3994 [default0]:Skipping sample id=2735556. Maximum sequence length: 2049, sample length: 3823 [default0]:Skipping sample id=2750852. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2497289. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2720422. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2746939. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2738140. Maximum sequence length: 2049, sample length: 3329 [default0]:Skipping sample id=2717481. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2716085. Maximum sequence length: 2049, sample length: 3201 [default0]:Skipping sample id=2729756. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2717450. Maximum sequence length: 2049, sample length: 6492 [default0]:Skipping sample id=2747898. Maximum sequence length: 2049, sample length: 3612 [default0]:Skipping sample id=2489916. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2750939. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2748831. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2739425. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2711082. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2494539. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2733986. Maximum sequence length: 2049, sample length: 6141 [default0]:Skipping sample id=2753726. Maximum sequence length: 2049, sample length: 4078 [default0]:Skipping sample id=2730658. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2742050. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2744387. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2718546. Maximum sequence length: 2049, sample length: 2903 [default0]:Skipping sample id=2720796. Maximum sequence length: 2049, sample length: 4228 [default0]:Skipping sample id=2755960. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2743931. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2749758. Maximum sequence length: 2049, sample length: 2826 [default0]:Skipping sample id=2731129. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2469219. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2714924. Maximum sequence length: 2049, sample length: 4597 [default0]:Skipping sample id=2731592. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2744381. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2746756. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2749206. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2712994. Maximum sequence length: 2049, sample length: 4719 [default0]:Skipping sample id=2713898. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2715552. Maximum sequence length: 2049, sample length: 2974 [default0]:Skipping sample id=2728070. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2712303. Maximum sequence length: 2049, sample length: 6956 [default0]:Skipping sample id=2749111. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2714281. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2740887. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2732781. Maximum sequence length: 2049, sample length: 3571 [default0]:Skipping sample id=2722732. Maximum sequence length: 2049, sample length: 2934 [default0]:Skipping sample id=2712061. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2715077. Maximum sequence length: 2049, sample length: 7785 [default0]:Skipping sample id=2749427. Maximum sequence length: 2049, sample length: 2771 [default0]:Skipping sample id=2743761. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2748646. Maximum sequence length: 2049, sample length: 2555 [default0]:Skipping sample id=2714104. Maximum sequence length: 2049, sample length: 4572 [default0]:Skipping sample id=2716045. Maximum sequence length: 2049, sample length: 4827 [default0]:Skipping sample id=2757082. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2719148. Maximum sequence length: 2049, sample length: 3970 [default0]:Skipping sample id=2721106. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2488608. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2712457. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2470037. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2725182. Maximum sequence length: 2049, sample length: 4604 [default0]:Skipping sample id=2483978. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2726548. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2726636. Maximum sequence length: 2049, sample length: 5191 [default0]:Skipping sample id=2756497. Maximum sequence length: 2049, sample length: 3231 [default0]:Skipping sample id=2739895. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2753268. Maximum sequence length: 2049, sample length: 3504 [default0]:Skipping sample id=2752465. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2733563. Maximum sequence length: 2049, sample length: 4171 [default0]:Skipping sample id=2735798. Maximum sequence length: 2049, sample length: 2942 [default0]:Skipping sample id=2741828. Maximum sequence length: 2049, sample length: 3075 [default0]:Skipping sample id=2747743. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2722480. Maximum sequence length: 2049, sample length: 3172 [default0]:Skipping sample id=2489938. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2730996. Maximum sequence length: 2049, sample length: 3903 [default0]:Skipping sample id=2739527. Maximum sequence length: 2049, sample length: 4558 [default0]:Skipping sample id=2731663. Maximum sequence length: 2049, sample length: 3921 [default0]:Skipping sample id=2719252. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2716134. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2735349. Maximum sequence length: 2049, sample length: 2468 [default0]:Skipping sample id=2733486. Maximum sequence length: 2049, sample length: 4928 [default0]:Skipping sample id=2481366. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2749036. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2722820. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2715553. Maximum sequence length: 2049, sample length: 3564 [default0]:Skipping sample id=2739559. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2497449. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2715284. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2721410. Maximum sequence length: 2049, sample length: 3369 [default0]:Skipping sample id=2712436. Maximum sequence length: 2049, sample length: 4548 [default0]:Skipping sample id=2747716. Maximum sequence length: 2049, sample length: 7109 [default0]:Skipping sample id=2742666. Maximum sequence length: 2049, sample length: 2871 [default0]:Skipping sample id=2716032. Maximum sequence length: 2049, sample length: 4134 [default0]:Skipping sample id=2712023. Maximum sequence length: 2049, sample length: 3277 [default0]:Skipping sample id=2713048. Maximum sequence length: 2049, sample length: 3143 [default0]:Skipping sample id=2468323. Maximum sequence length: 2049, sample length: 3016 [default0]:Skipping sample id=2733195. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2737657. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2736402. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2715472. Maximum sequence length: 2049, sample length: 4501 [default0]:Skipping sample id=2493312. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2713142. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2495118. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2716953. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2731131. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2748819. Maximum sequence length: 2049, sample length: 3092 [default0]:Skipping sample id=2721584. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2731584. Maximum sequence length: 2049, sample length: 4031 [default0]:Skipping sample id=2722010. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2750584. Maximum sequence length: 2049, sample length: 3582 [default0]:Skipping sample id=2736772. Maximum sequence length: 2049, sample length: 3185 [default0]:Skipping sample id=2754397. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2739887. Maximum sequence length: 2049, sample length: 2908 [default0]:Skipping sample id=2492494. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2725006. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2741175. Maximum sequence length: 2049, sample length: 3232 [default0]:Skipping sample id=2723705. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2489683. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2741732. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2731617. Maximum sequence length: 2049, sample length: 4852 [default0]:Skipping sample id=2741939. Maximum sequence length: 2049, sample length: 5704 [default0]:Skipping sample id=2747337. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2742539. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2744649. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2487515. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2732389. Maximum sequence length: 2049, sample length: 3010 [default0]:Skipping sample id=2732917. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2756528. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2724377. Maximum sequence length: 2049, sample length: 8241 [default0]:Skipping sample id=2733422. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2736511. Maximum sequence length: 2049, sample length: 3263 [default0]:Skipping sample id=2744272. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2738224. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2477772. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2755503. Maximum sequence length: 2049, sample length: 3334 [default0]:Skipping sample id=2733266. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2734109. Maximum sequence length: 2049, sample length: 2737 [default0]:Skipping sample id=2712757. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2718909. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2725875. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2726205. Maximum sequence length: 2049, sample length: 4320 [default0]:Skipping sample id=2739258. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2742791. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2717793. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2716600. Maximum sequence length: 2049, sample length: 2709 [default0]:Skipping sample id=2470249. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2728096. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2716151. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2748117. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2727034. Maximum sequence length: 2049, sample length: 2922 [default0]:Skipping sample id=2748148. Maximum sequence length: 2049, sample length: 3393 [default0]:Skipping sample id=2746372. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2737161. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2721132. Maximum sequence length: 2049, sample length: 3209 [default0]:Skipping sample id=2723986. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2735859. Maximum sequence length: 2049, sample length: 3345 [default0]:Skipping sample id=2727610. Maximum sequence length: 2049, sample length: 5448 [default0]:Skipping sample id=2712367. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2734011. Maximum sequence length: 2049, sample length: 3749 [default0]:Skipping sample id=2728142. Maximum sequence length: 2049, sample length: 2728 [default0]:Skipping sample id=2743155. Maximum sequence length: 2049, sample length: 5174 [default0]:Skipping sample id=2732320. Maximum sequence length: 2049, sample length: 4156 [default0]:Skipping sample id=2722349. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2713141. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2718648. Maximum sequence length: 2049, sample length: 3996 [default0]:Skipping sample id=2720602. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2724161. Maximum sequence length: 2049, sample length: 2808 [default0]:Skipping sample id=2716420. Maximum sequence length: 2049, sample length: 3280 [default0]:Skipping sample id=2741603. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2723287. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2720682. Maximum sequence length: 2049, sample length: 3564 [default0]:Skipping sample id=2719855. Maximum sequence length: 2049, sample length: 4293 [default0]:Skipping sample id=2741921. Maximum sequence length: 2049, sample length: 3827 [default0]:Skipping sample id=2756196. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2498815. Maximum sequence length: 2049, sample length: 2507 [default0]:Skipping sample id=2753698. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2713155. Maximum sequence length: 2049, sample length: 7210 [default0]:Skipping sample id=2747998. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2711815. Maximum sequence length: 2049, sample length: 2601 [default0]:Skipping sample id=2754149. Maximum sequence length: 2049, sample length: 6439 [default0]:Skipping sample id=2467010. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2719694. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2740259. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2737956. Maximum sequence length: 2049, sample length: 2889 [default0]:Skipping sample id=2494039. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2747643. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2716694. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2740113. Maximum sequence length: 2049, sample length: 3564 [default0]:Skipping sample id=2481746. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2746895. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2729247. Maximum sequence length: 2049, sample length: 5363 [default0]:Skipping sample id=2717698. Maximum sequence length: 2049, sample length: 3601 [default0]:Skipping sample id=2744537. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2721263. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2733179. Maximum sequence length: 2049, sample length: 3820 [default0]:Skipping sample id=2747019. Maximum sequence length: 2049, sample length: 3836 [default0]:Skipping sample id=2726289. Maximum sequence length: 2049, sample length: 4275 [default0]:Skipping sample id=2735385. Maximum sequence length: 2049, sample length: 3665 [default0]:Skipping sample id=2722256. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2484344. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2745199. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2756167. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2737794. Maximum sequence length: 2049, sample length: 5204 [default0]:Skipping sample id=2733513. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2487165. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2722604. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2711480. Maximum sequence length: 2049, sample length: 4259 [default0]:Skipping sample id=2713767. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2719371. Maximum sequence length: 2049, sample length: 2969 [default0]:Skipping sample id=2748165. Maximum sequence length: 2049, sample length: 4159 [default0]:Skipping sample id=2713018. Maximum sequence length: 2049, sample length: 2925 [default0]:Skipping sample id=2717348. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2752346. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2746967. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2721384. Maximum sequence length: 2049, sample length: 5945 [default0]:Skipping sample id=2740455. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2733271. Maximum sequence length: 2049, sample length: 2727 [default0]:Skipping sample id=2742760. Maximum sequence length: 2049, sample length: 4936 [default0]:Skipping sample id=2712249. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2741423. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2714061. Maximum sequence length: 2049, sample length: 4036 [default0]:Skipping sample id=2470746. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2748834. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2720425. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2721334. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2742923. Maximum sequence length: 2049, sample length: 5205 [default0]:Skipping sample id=2742389. Maximum sequence length: 2049, sample length: 3823 [default0]:Skipping sample id=2498145. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2467195. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2742697. Maximum sequence length: 2049, sample length: 3948 [default0]:Skipping sample id=2716245. Maximum sequence length: 2049, sample length: 3987 [default0]:Skipping sample id=2732862. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2738796. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2739235. Maximum sequence length: 2049, sample length: 3754 [default0]:Skipping sample id=2480629. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2466464. Maximum sequence length: 2049, sample length: 2826 [default0]:Skipping sample id=2740326. Maximum sequence length: 2049, sample length: 3574 [default0]:Skipping sample id=2755713. Maximum sequence length: 2049, sample length: 3580 [default0]:Skipping sample id=2730972. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2756678. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2721894. Maximum sequence length: 2049, sample length: 3573 [default0]:Skipping sample id=2742088. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2711941. Maximum sequence length: 2049, sample length: 5320 [default0]:Skipping sample id=2716645. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2712888. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2752235. Maximum sequence length: 2049, sample length: 3252 [default0]:Skipping sample id=2742437. Maximum sequence length: 2049, sample length: 3994 [default0]:Skipping sample id=2733243. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2751535. Maximum sequence length: 2049, sample length: 2908 [default0]:Skipping sample id=2724706. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2749074. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2711172. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2469566. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2488305. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2738963. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2727280. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2715591. Maximum sequence length: 2049, sample length: 4330 [default0]:Skipping sample id=2729775. Maximum sequence length: 2049, sample length: 3141 [default0]:Skipping sample id=2481913. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2728701. Maximum sequence length: 2049, sample length: 2623 [default0]:Skipping sample id=2721780. Maximum sequence length: 2049, sample length: 5201 [default0]:Skipping sample id=2722948. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2714814. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2749054. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2749343. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2722444. Maximum sequence length: 2049, sample length: 7112 [default0]:Skipping sample id=2735933. Maximum sequence length: 2049, sample length: 4796 [default0]:Skipping sample id=2729134. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2729646. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2731109. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2489373. Maximum sequence length: 2049, sample length: 3602 [default0]:Skipping sample id=2737753. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2732955. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2741714. Maximum sequence length: 2049, sample length: 4805 [default0]:Skipping sample id=2736101. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2751647. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2730904. Maximum sequence length: 2049, sample length: 5843 [default0]:Skipping sample id=2752077. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2737015. Maximum sequence length: 2049, sample length: 3005 [default0]:Skipping sample id=2717075. Maximum sequence length: 2049, sample length: 4844 [default0]:Skipping sample id=2717722. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2748255. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2726634. Maximum sequence length: 2049, sample length: 4817 [default0]:Skipping sample id=2752314. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2754299. Maximum sequence length: 2049, sample length: 4777 [default0]:Skipping sample id=2753944. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2714324. Maximum sequence length: 2049, sample length: 4186 [default0]:Skipping sample id=2728779. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2722972. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2744803. Maximum sequence length: 2049, sample length: 3092 [default0]:Skipping sample id=2741581. Maximum sequence length: 2049, sample length: 4314 [default0]:Skipping sample id=2712114. Maximum sequence length: 2049, sample length: 5466 [default0]:Skipping sample id=2717434. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2754053. Maximum sequence length: 2049, sample length: 6221 [default0]:Skipping sample id=2753422. Maximum sequence length: 2049, sample length: 2511 [default0]:Skipping sample id=2722075. Maximum sequence length: 2049, sample length: 3916 [default0]:Skipping sample id=2716116. Maximum sequence length: 2049, sample length: 4320 [default0]:Skipping sample id=2744834. Maximum sequence length: 2049, sample length: 2974 [default0]:Skipping sample id=2756181. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2725908. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2738858. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2736344. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2490103. Maximum sequence length: 2049, sample length: 3298 [default0]:Skipping sample id=2755010. Maximum sequence length: 2049, sample length: 5149 [default0]:Skipping sample id=2712950. Maximum sequence length: 2049, sample length: 5439 [default0]:Skipping sample id=2747915. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2721252. Maximum sequence length: 2049, sample length: 3318 [default0]:Skipping sample id=2744332. Maximum sequence length: 2049, sample length: 2858 [default0]:Skipping sample id=2736604. Maximum sequence length: 2049, sample length: 2566 [default0]:Skipping sample id=2716438. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2740194. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2747369. Maximum sequence length: 2049, sample length: 5529 [default0]:Skipping sample id=2753599. Maximum sequence length: 2049, sample length: 3331 [default0]:Skipping sample id=2732436. Maximum sequence length: 2049, sample length: 2611 [default0]:Skipping sample id=2754475. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2736450. Maximum sequence length: 2049, sample length: 5350 [default0]:Skipping sample id=2724519. Maximum sequence length: 2049, sample length: 3053 [default0]:Skipping sample id=2484407. Maximum sequence length: 2049, sample length: 2734 [default0]:Skipping sample id=2735800. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2718878. Maximum sequence length: 2049, sample length: 3591 [default0]:Skipping sample id=2715568. Maximum sequence length: 2049, sample length: 4174 [default0]:Skipping sample id=2718345. Maximum sequence length: 2049, sample length: 3468 [default0]:Skipping sample id=2714971. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2716334. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2752108. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2741347. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2751122. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2737469. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2744311. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2747389. Maximum sequence length: 2049, sample length: 5019 [default0]:Skipping sample id=2746688. Maximum sequence length: 2049, sample length: 3627 [default0]:Skipping sample id=2749497. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2712425. Maximum sequence length: 2049, sample length: 3169 [default0]:Skipping sample id=2712390. Maximum sequence length: 2049, sample length: 2687 [default0]:Skipping sample id=2728562. Maximum sequence length: 2049, sample length: 5665 [default0]:Skipping sample id=2742426. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2735191. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2752321. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2748237. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2737640. Maximum sequence length: 2049, sample length: 3605 [default0]:Skipping sample id=2717095. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2730369. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2744299. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2755026. Maximum sequence length: 2049, sample length: 8477 [default0]:Skipping sample id=2716921. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2726201. Maximum sequence length: 2049, sample length: 6531 [default0]:Skipping sample id=2719618. Maximum sequence length: 2049, sample length: 7617 [default0]:Skipping sample id=2716225. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2750122. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2731857. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2756819. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2718781. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2751637. Maximum sequence length: 2049, sample length: 3720 [default0]:Skipping sample id=2729116. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2724508. Maximum sequence length: 2049, sample length: 2580 [default0]:Skipping sample id=2744559. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2724261. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2729837. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2749429. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2731515. Maximum sequence length: 2049, sample length: 4361 [default0]:Skipping sample id=2478255. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2490799. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2732739. Maximum sequence length: 2049, sample length: 5448 [default0]:Skipping sample id=2748597. Maximum sequence length: 2049, sample length: 4080 [default0]:Skipping sample id=2495370. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2717708. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2733630. Maximum sequence length: 2049, sample length: 2796 [default0]:Skipping sample id=2754438. Maximum sequence length: 2049, sample length: 2849 [default0]:Skipping sample id=2739959. Maximum sequence length: 2049, sample length: 2640 [default0]:Skipping sample id=2711608. Maximum sequence length: 2049, sample length: 3459 [default0]:Skipping sample id=2741568. Maximum sequence length: 2049, sample length: 2950 [default0]:Skipping sample id=2741977. Maximum sequence length: 2049, sample length: 3246 [default0]:Skipping sample id=2731582. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2493549. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2713435. Maximum sequence length: 2049, sample length: 3204 [default0]:Skipping sample id=2756424. Maximum sequence length: 2049, sample length: 3377 [default0]:Skipping sample id=2499372. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2731019. Maximum sequence length: 2049, sample length: 2983 [default0]:Skipping sample id=2725032. Maximum sequence length: 2049, sample length: 3435 [default0]:Skipping sample id=2477031. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2718246. Maximum sequence length: 2049, sample length: 6543 [default0]:Skipping sample id=2746518. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2478646. Maximum sequence length: 2049, sample length: 2468 [default0]:Skipping sample id=2753217. Maximum sequence length: 2049, sample length: 3157 [default0]:Skipping sample id=2484670. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2713405. Maximum sequence length: 2049, sample length: 3571 [default0]:Skipping sample id=2730847. Maximum sequence length: 2049, sample length: 4244 [default0]:Skipping sample id=2714182. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2487867. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2725164. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2733753. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2735308. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2744462. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2748122. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2751857. Maximum sequence length: 2049, sample length: 2822 [default0]:Skipping sample id=2714946. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2729997. Maximum sequence length: 2049, sample length: 3726 [default0]:Skipping sample id=2751061. Maximum sequence length: 2049, sample length: 5964 [default0]:Skipping sample id=2720800. Maximum sequence length: 2049, sample length: 3321 [default0]:Skipping sample id=2739905. Maximum sequence length: 2049, sample length: 3407 [default0]:Skipping sample id=2496906. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2741926. Maximum sequence length: 2049, sample length: 3101 [default0]:Skipping sample id=2727877. Maximum sequence length: 2049, sample length: 3564 [default0]:Skipping sample id=2467009. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2744221. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2721217. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2494023. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2728502. Maximum sequence length: 2049, sample length: 3750 [default0]:Skipping sample id=2715995. Maximum sequence length: 2049, sample length: 3273 [default0]:Skipping sample id=2711438. Maximum sequence length: 2049, sample length: 3382 [default0]:Skipping sample id=2736263. Maximum sequence length: 2049, sample length: 3109 [default0]:Skipping sample id=2725628. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2733448. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2727318. Maximum sequence length: 2049, sample length: 3747 [default0]:Skipping sample id=2737167. Maximum sequence length: 2049, sample length: 5028 [default0]:Skipping sample id=2731363. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2737326. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2727159. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2466136. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2732709. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2723163. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2724358. Maximum sequence length: 2049, sample length: 3894 [default0]:Skipping sample id=2730039. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2719372. Maximum sequence length: 2049, sample length: 5207 [default0]:Skipping sample id=2750379. Maximum sequence length: 2049, sample length: 3639 [default0]:Skipping sample id=2720125. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2717119. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2729285. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2485524. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2721939. Maximum sequence length: 2049, sample length: 2513 [default0]:Skipping sample id=2756109. Maximum sequence length: 2049, sample length: 7562 [default0]:Skipping sample id=2756725. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2494042. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2751941. Maximum sequence length: 2049, sample length: 3501 [default0]:Skipping sample id=2725186. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2742085. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2495180. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2733649. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2721129. Maximum sequence length: 2049, sample length: 3986 [default0]:Skipping sample id=2742331. Maximum sequence length: 2049, sample length: 4014 [default0]:Skipping sample id=2467022. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2715954. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2753570. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2735604. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2743310. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2717489. Maximum sequence length: 2049, sample length: 4977 [default0]:Skipping sample id=2725142. Maximum sequence length: 2049, sample length: 2616 [default0]:Skipping sample id=2750603. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2730165. Maximum sequence length: 2049, sample length: 3244 [default0]:Skipping sample id=2724868. Maximum sequence length: 2049, sample length: 3743 [default0]:Skipping sample id=2497913. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2733890. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2716039. Maximum sequence length: 2049, sample length: 2117 [default0]:Skipping sample id=2743612. Maximum sequence length: 2049, sample length: 3337 [default0]:Skipping sample id=2729320. Maximum sequence length: 2049, sample length: 2609 [default0]:Skipping sample id=2734990. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2491277. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2726037. Maximum sequence length: 2049, sample length: 3313 [default0]:Skipping sample id=2743334. Maximum sequence length: 2049, sample length: 2683 [default0]:Skipping sample id=2743275. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2753161. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2486661. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2711924. Maximum sequence length: 2049, sample length: 3275 [default0]:Skipping sample id=2736444. Maximum sequence length: 2049, sample length: 4557 [default0]:Skipping sample id=2726303. Maximum sequence length: 2049, sample length: 4834 [default0]:Skipping sample id=2737805. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2716051. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2489204. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2735130. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2496140. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2748821. Maximum sequence length: 2049, sample length: 4777 [default0]:Skipping sample id=2736445. Maximum sequence length: 2049, sample length: 4978 [default0]:Skipping sample id=2751956. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2717334. Maximum sequence length: 2049, sample length: 4749 [default0]:Skipping sample id=2713350. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2711913. Maximum sequence length: 2049, sample length: 3025 [default0]:Skipping sample id=2744576. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2749089. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2740402. Maximum sequence length: 2049, sample length: 3717 [default0]:Skipping sample id=2727994. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2722000. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2755624. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2729880. Maximum sequence length: 2049, sample length: 2845 [default0]:Skipping sample id=2749119. Maximum sequence length: 2049, sample length: 3373 [default0]:Skipping sample id=2755595. Maximum sequence length: 2049, sample length: 3053 [default0]:Skipping sample id=2743493. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2725810. Maximum sequence length: 2049, sample length: 2981 [default0]:Skipping sample id=2494846. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2729357. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2741262. Maximum sequence length: 2049, sample length: 2486 [default0]:Skipping sample id=2733164. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2740885. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2751286. Maximum sequence length: 2049, sample length: 3416 [default0]:Skipping sample id=2716535. Maximum sequence length: 2049, sample length: 4052 [default0]:Skipping sample id=2727050. Maximum sequence length: 2049, sample length: 4074 [default0]:Skipping sample id=2493652. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2731475. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2732508. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2733908. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2754392. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2748630. Maximum sequence length: 2049, sample length: 2689 [default0]:Skipping sample id=2750199. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2749882. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2752650. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2725699. Maximum sequence length: 2049, sample length: 2518 [default0]:Skipping sample id=2499200. Maximum sequence length: 2049, sample length: 2741 [default0]:Skipping sample id=2719471. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2744502. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2736383. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2742946. Maximum sequence length: 2049, sample length: 2958 [default0]:Skipping sample id=2470834. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2466461. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2718389. Maximum sequence length: 2049, sample length: 3582 [default0]:Skipping sample id=2743478. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2719839. Maximum sequence length: 2049, sample length: 5522 [default0]:Skipping sample id=2719774. Maximum sequence length: 2049, sample length: 2623 [default0]:Skipping sample id=2730424. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2737613. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2724738. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2740417. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2731636. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2737480. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2733510. Maximum sequence length: 2049, sample length: 4709 [default0]:Skipping sample id=2711998. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2724082. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2732351. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2744075. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2756669. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2755901. Maximum sequence length: 2049, sample length: 3483 [default0]:Skipping sample id=2748072. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2742766. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2749065. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2729437. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2490881. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2716563. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2741074. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2737884. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2734679. Maximum sequence length: 2049, sample length: 2891 [default0]:Skipping sample id=2739018. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2741484. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2732355. Maximum sequence length: 2049, sample length: 2841 [default0]:Skipping sample id=2718324. Maximum sequence length: 2049, sample length: 4628 [default0]:Skipping sample id=2722178. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2724554. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2731157. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2720067. Maximum sequence length: 2049, sample length: 2920 [default0]:Skipping sample id=2737410. Maximum sequence length: 2049, sample length: 2587 [default0]:Skipping sample id=2753228. Maximum sequence length: 2049, sample length: 2980 [default0]:Skipping sample id=2751700. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2720663. Maximum sequence length: 2049, sample length: 3927 [default0]:Skipping sample id=2736593. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2720342. Maximum sequence length: 2049, sample length: 3833 [default0]:Skipping sample id=2754026. Maximum sequence length: 2049, sample length: 4592 [default0]:Skipping sample id=2716609. Maximum sequence length: 2049, sample length: 4259 [default0]:Skipping sample id=2496176. Maximum sequence length: 2049, sample length: 3844 [default0]:Skipping sample id=2734967. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2470036. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2721229. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2719278. Maximum sequence length: 2049, sample length: 3945 [default0]:Skipping sample id=2714013. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2715980. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2738540. Maximum sequence length: 2049, sample length: 2874 [default0]:Skipping sample id=2723208. Maximum sequence length: 2049, sample length: 5446 [default0]:Skipping sample id=2720086. Maximum sequence length: 2049, sample length: 3821 [default0]:Skipping sample id=2746825. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2723659. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2720557. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2722506. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2722683. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2734245. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2755646. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2739324. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2720137. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2713949. Maximum sequence length: 2049, sample length: 6017 [default0]:Skipping sample id=2735976. Maximum sequence length: 2049, sample length: 4244 [default0]:Skipping sample id=2725675. Maximum sequence length: 2049, sample length: 4813 [default0]:Skipping sample id=2467403. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2726903. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2711370. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2722017. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2720264. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2748984. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2723810. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2750545. Maximum sequence length: 2049, sample length: 3630 [default0]:Skipping sample id=2732996. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2714667. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2736703. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2744746. Maximum sequence length: 2049, sample length: 3611 [default0]:Skipping sample id=2743143. Maximum sequence length: 2049, sample length: 3086 [default0]:Skipping sample id=2738528. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2712573. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2467107. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2724856. Maximum sequence length: 2049, sample length: 2874 [default0]:Skipping sample id=2466221. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2724021. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2733548. Maximum sequence length: 2049, sample length: 3520 [default0]:Skipping sample id=2470479. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2495265. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2742538. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2720022. Maximum sequence length: 2049, sample length: 6626 [default0]:Skipping sample id=2716440. Maximum sequence length: 2049, sample length: 2808 [default0]:Skipping sample id=2749964. Maximum sequence length: 2049, sample length: 3784 [default0]:Skipping sample id=2743706. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2744189. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2756554. Maximum sequence length: 2049, sample length: 3948 [default0]:Skipping sample id=2715867. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2736207. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2738839. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2483803. Maximum sequence length: 2049, sample length: 3392 [default0]:Skipping sample id=2728448. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2744057. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2713623. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2738959. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2734455. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2721599. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2721664. Maximum sequence length: 2049, sample length: 4059 [default0]:Skipping sample id=2736273. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2741198. Maximum sequence length: 2049, sample length: 3372 [default0]:Skipping sample id=2753521. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2749731. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2714957. Maximum sequence length: 2049, sample length: 3821 [default0]:Skipping sample id=2717890. Maximum sequence length: 2049, sample length: 3697 [default0]:Skipping sample id=2713189. Maximum sequence length: 2049, sample length: 6491 [default0]:Skipping sample id=2734643. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2755156. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2738802. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2714726. Maximum sequence length: 2049, sample length: 4293 [default0]:Skipping sample id=2745337. Maximum sequence length: 2049, sample length: 3047 [default0]:Skipping sample id=2740640. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2746420. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2736741. Maximum sequence length: 2049, sample length: 3252 [default0]:Skipping sample id=2478253. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2717354. Maximum sequence length: 2049, sample length: 2953 [default0]:Skipping sample id=2739062. Maximum sequence length: 2049, sample length: 2854 [default0]:Skipping sample id=2715757. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2713638. Maximum sequence length: 2049, sample length: 2973 [default0]:Skipping sample id=2714790. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2728114. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2741447. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2711311. Maximum sequence length: 2049, sample length: 2808 [default0]:Skipping sample id=2713814. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2488072. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2729223. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2496421. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2720863. Maximum sequence length: 2049, sample length: 3961 [default0]:Skipping sample id=2752978. Maximum sequence length: 2049, sample length: 4694 [default0]:Skipping sample id=2496868. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2731293. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2482778. Maximum sequence length: 2049, sample length: 2623 [default0]:Skipping sample id=2722845. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2726601. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2732858. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2718244. Maximum sequence length: 2049, sample length: 3760 [default0]:Skipping sample id=2730893. Maximum sequence length: 2049, sample length: 3152 [default0]:Skipping sample id=2751861. Maximum sequence length: 2049, sample length: 2848 [default0]:Skipping sample id=2466375. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2748198. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2754767. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2715898. Maximum sequence length: 2049, sample length: 4539 [default0]:Skipping sample id=2750481. Maximum sequence length: 2049, sample length: 3667 [default0]:Skipping sample id=2726157. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2740694. Maximum sequence length: 2049, sample length: 3655 [default0]:Skipping sample id=2492546. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2732174. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2740723. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2725247. Maximum sequence length: 2049, sample length: 3754 [default0]:Skipping sample id=2743834. Maximum sequence length: 2049, sample length: 3730 [default0]:Skipping sample id=2486642. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2727235. Maximum sequence length: 2049, sample length: 4371 [default0]:Skipping sample id=2755480. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2494710. Maximum sequence length: 2049, sample length: 3195 [default0]:Skipping sample id=2735758. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2721425. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2482795. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2734261. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2496503. Maximum sequence length: 2049, sample length: 2166 [default0]:Skipping sample id=2489886. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2477121. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2731369. Maximum sequence length: 2049, sample length: 2862 [default0]:Skipping sample id=2483784. Maximum sequence length: 2049, sample length: 3636 [default0]:Skipping sample id=2727069. Maximum sequence length: 2049, sample length: 3724 [default0]:Skipping sample id=2755512. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2713115. Maximum sequence length: 2049, sample length: 3760 [default0]:Skipping sample id=2754735. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2747234. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2736160. Maximum sequence length: 2049, sample length: 3305 [default0]:Skipping sample id=2749574. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2714476. Maximum sequence length: 2049, sample length: 3856 [default0]:Skipping sample id=2712435. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2488424. Maximum sequence length: 2049, sample length: 3591 [default0]:Skipping sample id=2732873. Maximum sequence length: 2049, sample length: 3226 [default0]:Skipping sample id=2713054. Maximum sequence length: 2049, sample length: 3298 [default0]:Skipping sample id=2483786. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2482275. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2729876. Maximum sequence length: 2049, sample length: 7607 [default0]:Skipping sample id=2712652. Maximum sequence length: 2049, sample length: 4802 [default0]:Skipping sample id=2498168. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2749376. Maximum sequence length: 2049, sample length: 3273 [default0]:Skipping sample id=2739443. Maximum sequence length: 2049, sample length: 2990 [default0]:Skipping sample id=2477255. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2742694. Maximum sequence length: 2049, sample length: 2513 [default0]:Skipping sample id=2750016. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2711961. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2495477. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2731280. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2742334. Maximum sequence length: 2049, sample length: 4837 [default0]:Skipping sample id=2480723. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2749806. Maximum sequence length: 2049, sample length: 2589 [default0]:Skipping sample id=2716003. Maximum sequence length: 2049, sample length: 3209 [default0]:Skipping sample id=2751510. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2484316. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2726953. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2746516. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2751863. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2712480. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2715413. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2746558. Maximum sequence length: 2049, sample length: 4310 [default0]:Skipping sample id=2739900. Maximum sequence length: 2049, sample length: 4087 [default0]:Skipping sample id=2723290. Maximum sequence length: 2049, sample length: 3538 [default0]:Skipping sample id=2717532. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2749116. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2741316. Maximum sequence length: 2049, sample length: 3222 [default0]:Skipping sample id=2727845. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2724640. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2747193. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2493578. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2487003. Maximum sequence length: 2049, sample length: 2139 [default0]:Skipping sample id=2734681. Maximum sequence length: 2049, sample length: 5312 [default0]:Skipping sample id=2713303. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2751313. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2716549. Maximum sequence length: 2049, sample length: 3416 [default0]:Skipping sample id=2741292. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2723871. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2714549. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2711714. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2480097. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2712175. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2752830. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2726623. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2746738. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2725519. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2750221. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2467623. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2478939. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2726612. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2720541. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2726658. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2749953. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2734890. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2726851. Maximum sequence length: 2049, sample length: 2886 [default0]:Skipping sample id=2716969. Maximum sequence length: 2049, sample length: 3006 [default0]:Skipping sample id=2720534. Maximum sequence length: 2049, sample length: 2869 [default0]:Skipping sample id=2712408. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2749606. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2737734. Maximum sequence length: 2049, sample length: 2400 [default0]:Skipping sample id=2468040. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2718893. Maximum sequence length: 2049, sample length: 5766 [default0]:Skipping sample id=2727546. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2734695. Maximum sequence length: 2049, sample length: 3410 [default0]:Skipping sample id=2718751. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2751287. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2742827. Maximum sequence length: 2049, sample length: 3022 [default0]:Skipping sample id=2738359. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2721569. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2721360. Maximum sequence length: 2049, sample length: 2120 [default0]:Skipping sample id=2742875. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2734996. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2715734. Maximum sequence length: 2049, sample length: 3961 [default0]:Skipping sample id=2731273. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2723951. Maximum sequence length: 2049, sample length: 4988 [default0]:Skipping sample id=2749541. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2715551. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2753129. Maximum sequence length: 2049, sample length: 3936 [default0]:Skipping sample id=2748474. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2739768. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2495729. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2721876. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2493694. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2754207. Maximum sequence length: 2049, sample length: 5541 [default0]:Skipping sample id=2719563. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2479604. Maximum sequence length: 2049, sample length: 2610 [default0]:Skipping sample id=2748152. Maximum sequence length: 2049, sample length: 2322 [default0]:Skipping sample id=2746035. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2755644. Maximum sequence length: 2049, sample length: 3301 [default0]:Skipping sample id=2714631. Maximum sequence length: 2049, sample length: 4084 [default0]:Skipping sample id=2756584. Maximum sequence length: 2049, sample length: 2750 [default0]:Skipping sample id=2750146. Maximum sequence length: 2049, sample length: 4958 [default0]:Skipping sample id=2495676. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2485129. Maximum sequence length: 2049, sample length: 3549 [default0]:Skipping sample id=2725302. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2723926. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2749183. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2716934. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2754838. Maximum sequence length: 2049, sample length: 3056 [default0]:Skipping sample id=2750955. Maximum sequence length: 2049, sample length: 3509 [default0]:Skipping sample id=2712439. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2713527. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2489330. Maximum sequence length: 2049, sample length: 3885 [default0]:Skipping sample id=2742925. Maximum sequence length: 2049, sample length: 4952 [default0]:Skipping sample id=2723228. Maximum sequence length: 2049, sample length: 4945 [default0]:Skipping sample id=2747795. Maximum sequence length: 2049, sample length: 2878 [default0]:Skipping sample id=2745680. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2743422. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2715671. Maximum sequence length: 2049, sample length: 3321 [default0]:Skipping sample id=2731752. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2747509. Maximum sequence length: 2049, sample length: 3253 [default0]:Skipping sample id=2724710. Maximum sequence length: 2049, sample length: 6645 [default0]:Skipping sample id=2497896. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2727455. Maximum sequence length: 2049, sample length: 4038 [default0]:Skipping sample id=2746604. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2716754. Maximum sequence length: 2049, sample length: 6060 [default0]:Skipping sample id=2748924. Maximum sequence length: 2049, sample length: 6222 [default0]:Skipping sample id=2756675. Maximum sequence length: 2049, sample length: 3268 [default0]:Skipping sample id=2749221. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2756754. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2756239. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2722684. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2740043. Maximum sequence length: 2049, sample length: 3811 [default0]:Skipping sample id=2738246. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2733254. Maximum sequence length: 2049, sample length: 3063 [default0]:Skipping sample id=2488499. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2726774. Maximum sequence length: 2049, sample length: 2593 [default0]:Skipping sample id=2726882. Maximum sequence length: 2049, sample length: 4094 [default0]:Skipping sample id=2742598. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2490994. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2486251. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2733225. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2725048. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2734454. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2746878. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2733313. Maximum sequence length: 2049, sample length: 4897 [default0]:Skipping sample id=2724880. Maximum sequence length: 2049, sample length: 4540 [default0]:Skipping sample id=2721245. Maximum sequence length: 2049, sample length: 4327 [default0]:Skipping sample id=2754653. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2714788. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2470666. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2714756. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2717736. Maximum sequence length: 2049, sample length: 2706 [default0]:Skipping sample id=2720995. Maximum sequence length: 2049, sample length: 3212 [default0]:Skipping sample id=2736163. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2738692. Maximum sequence length: 2049, sample length: 4106 [default0]:Skipping sample id=2730770. Maximum sequence length: 2049, sample length: 3585 [default0]:Skipping sample id=2711827. Maximum sequence length: 2049, sample length: 3485 [default0]:Skipping sample id=2467414. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2756190. Maximum sequence length: 2049, sample length: 5964 [default0]:Skipping sample id=2726318. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2721997. Maximum sequence length: 2049, sample length: 3437 [default0]:Skipping sample id=2711578. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2740433. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2484231. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2751572. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2729318. Maximum sequence length: 2049, sample length: 3508 [default0]:Skipping sample id=2495812. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2752600. Maximum sequence length: 2049, sample length: 3601 [default0]:Skipping sample id=2744638. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2731386. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2737114. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2740814. Maximum sequence length: 2049, sample length: 3492 [default0]:Skipping sample id=2739046. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2756540. Maximum sequence length: 2049, sample length: 3040 [default0]:Skipping sample id=2731888. Maximum sequence length: 2049, sample length: 4893 [default0]:Skipping sample id=2722941. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2750378. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2734942. Maximum sequence length: 2049, sample length: 3521 [default0]:Skipping sample id=2722697. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2736142. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2720181. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2479191. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2727298. Maximum sequence length: 2049, sample length: 4704 [default0]:Skipping sample id=2711723. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2745664. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2716092. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2735142. Maximum sequence length: 2049, sample length: 3178 [default0]:Skipping sample id=2495326. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2471207. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2725131. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2748558. Maximum sequence length: 2049, sample length: 2669 [default0]:Skipping sample id=2719500. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2717460. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2737735. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2744233. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2470913. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2736685. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2722259. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2731184. Maximum sequence length: 2049, sample length: 3707 [default0]:Skipping sample id=2756776. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2466546. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2498674. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2738125. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2731212. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2740824. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2741526. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2727867. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2754799. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2734039. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2487244. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2717017. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2744519. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2733757. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2714598. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2716686. Maximum sequence length: 2049, sample length: 4112 [default0]:Skipping sample id=2752845. Maximum sequence length: 2049, sample length: 7108 [default0]:Skipping sample id=2730864. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2727495. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2739636. Maximum sequence length: 2049, sample length: 3346 [default0]:Skipping sample id=2726936. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2468854. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2746134. Maximum sequence length: 2049, sample length: 2584 [default0]:Skipping sample id=2731787. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2494895. Maximum sequence length: 2049, sample length: 4090 [default0]:Skipping sample id=2729335. Maximum sequence length: 2049, sample length: 3858 [default0]:Skipping sample id=2724591. Maximum sequence length: 2049, sample length: 3343 [default0]:Skipping sample id=2723571. Maximum sequence length: 2049, sample length: 3739 [default0]:Skipping sample id=2719146. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2488979. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2750735. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2715316. Maximum sequence length: 2049, sample length: 4182 [default0]:Skipping sample id=2734858. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2723318. Maximum sequence length: 2049, sample length: 3839 [default0]:Skipping sample id=2482066. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2488711. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2726141. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2715609. Maximum sequence length: 2049, sample length: 3913 [default0]:Skipping sample id=2480504. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2721087. Maximum sequence length: 2049, sample length: 3083 [default0]:Skipping sample id=2731313. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2725955. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2721630. Maximum sequence length: 2049, sample length: 4910 [default0]:Skipping sample id=2726394. Maximum sequence length: 2049, sample length: 2075 [default0]:Skipping sample id=2484442. Maximum sequence length: 2049, sample length: 3517 [default0]:Skipping sample id=2735413. Maximum sequence length: 2049, sample length: 4145 [default0]:Skipping sample id=2737643. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2477590. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2730999. Maximum sequence length: 2049, sample length: 2986 [default0]:Skipping sample id=2711319. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2719039. Maximum sequence length: 2049, sample length: 4091 [default0]:Skipping sample id=2714996. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2730382. Maximum sequence length: 2049, sample length: 2865 [default0]:Skipping sample id=2728504. Maximum sequence length: 2049, sample length: 6533 [default0]:Skipping sample id=2743512. Maximum sequence length: 2049, sample length: 2991 [default0]:Skipping sample id=2747485. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2717153. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2754878. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2739005. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2715432. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2736501. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2498490. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2733017. Maximum sequence length: 2049, sample length: 2894 [default0]:Skipping sample id=2721686. Maximum sequence length: 2049, sample length: 3296 [default0]:Skipping sample id=2731387. Maximum sequence length: 2049, sample length: 3976 [default0]:Skipping sample id=2723777. Maximum sequence length: 2049, sample length: 4003 [default0]:Skipping sample id=2718364. Maximum sequence length: 2049, sample length: 4441 [default0]:Skipping sample id=2744703. Maximum sequence length: 2049, sample length: 2995 [default0]:Skipping sample id=2733090. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2753366. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2753022. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2719501. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2741893. Maximum sequence length: 2049, sample length: 2417 [default0]:Skipping sample id=2494622. Maximum sequence length: 2049, sample length: 3161 [default0]:Skipping sample id=2485279. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2712668. Maximum sequence length: 2049, sample length: 3715 [default0]:Skipping sample id=2718876. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2735271. Maximum sequence length: 2049, sample length: 2904 [default0]:Skipping sample id=2720483. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2754214. Maximum sequence length: 2049, sample length: 3844 [default0]:Skipping sample id=2720776. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2711762. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2715480. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2746676. Maximum sequence length: 2049, sample length: 3333 [default0]:Skipping sample id=2720696. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2729760. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2479343. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2716916. Maximum sequence length: 2049, sample length: 6934 [default0]:Skipping sample id=2724061. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2750328. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2726302. Maximum sequence length: 2049, sample length: 3580 [default0]:Skipping sample id=2744588. Maximum sequence length: 2049, sample length: 2624 [default0]:Skipping sample id=2739268. Maximum sequence length: 2049, sample length: 2527 [default0]:Skipping sample id=2736072. Maximum sequence length: 2049, sample length: 3429 [default0]:Skipping sample id=2485172. Maximum sequence length: 2049, sample length: 3548 [default0]:Skipping sample id=2486933. Maximum sequence length: 2049, sample length: 3675 [default0]:Skipping sample id=2751829. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2741536. Maximum sequence length: 2049, sample length: 3624 [default0]:Skipping sample id=2720851. Maximum sequence length: 2049, sample length: 2558 [default0]:Skipping sample id=2753281. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2731932. Maximum sequence length: 2049, sample length: 3775 [default0]:Skipping sample id=2498116. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2750898. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2740412. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2751067. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2714774. Maximum sequence length: 2049, sample length: 3428 [default0]:Skipping sample id=2466802. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2740131. Maximum sequence length: 2049, sample length: 2598 [default0]:Skipping sample id=2746400. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2746408. Maximum sequence length: 2049, sample length: 4120 [default0]:Skipping sample id=2727102. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2715044. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2711070. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2477793. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2746649. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2713970. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2726243. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2494128. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2717374. Maximum sequence length: 2049, sample length: 3476 [default0]:Skipping sample id=2717496. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2740605. Maximum sequence length: 2049, sample length: 3482 [default0]:Skipping sample id=2739504. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2751489. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2731023. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2742340. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2738107. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2723116. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2725307. Maximum sequence length: 2049, sample length: 3842 [default0]:Skipping sample id=2718015. Maximum sequence length: 2049, sample length: 3746 [default0]:Skipping sample id=2716365. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2720699. Maximum sequence length: 2049, sample length: 4024 [default0]:Skipping sample id=2754079. Maximum sequence length: 2049, sample length: 4112 [default0]:Skipping sample id=2735627. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2712885. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2752206. Maximum sequence length: 2049, sample length: 3524 [default0]:Skipping sample id=2748614. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2755570. Maximum sequence length: 2049, sample length: 4598 [default0]:Skipping sample id=2756303. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2740916. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2735374. Maximum sequence length: 2049, sample length: 3088 [default0]:Skipping sample id=2753317. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2736711. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2725372. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2742971. Maximum sequence length: 2049, sample length: 3927 [default0]:Skipping sample id=2736476. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2749253. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2726728. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2720105. Maximum sequence length: 2049, sample length: 4635 [default0]:Skipping sample id=2719093. Maximum sequence length: 2049, sample length: 2589 [default0]:Skipping sample id=2468709. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2726299. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2719256. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2734775. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2714661. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2478863. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2490445. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2716760. Maximum sequence length: 2049, sample length: 4188 [default0]:Skipping sample id=2720345. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2496187. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2736083. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2755981. Maximum sequence length: 2049, sample length: 2938 [default0]:Skipping sample id=2734529. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2493824. Maximum sequence length: 2049, sample length: 3387 [default0]:Skipping sample id=2752792. Maximum sequence length: 2049, sample length: 2338 [default0]:Skipping sample id=2484222. Maximum sequence length: 2049, sample length: 3621 [default0]:Skipping sample id=2717099. Maximum sequence length: 2049, sample length: 3999 [default0]:Skipping sample id=2738068. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2469652. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2492805. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2728510. Maximum sequence length: 2049, sample length: 3445 [default0]:Skipping sample id=2721830. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2490794. Maximum sequence length: 2049, sample length: 3539 [default0]:Skipping sample id=2730737. Maximum sequence length: 2049, sample length: 4522 [default0]:Skipping sample id=2742861. Maximum sequence length: 2049, sample length: 4216 [default0]:Skipping sample id=2751827. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2721581. Maximum sequence length: 2049, sample length: 2928 [default0]:Skipping sample id=2735174. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2756572. Maximum sequence length: 2049, sample length: 3896 [default0]:Skipping sample id=2721163. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2742480. Maximum sequence length: 2049, sample length: 3161 [default0]:Skipping sample id=2742397. Maximum sequence length: 2049, sample length: 3317 [default0]:Skipping sample id=2711522. Maximum sequence length: 2049, sample length: 5194 [default0]:Skipping sample id=2728706. Maximum sequence length: 2049, sample length: 2120 [default0]:Skipping sample id=2746382. Maximum sequence length: 2049, sample length: 3358 [default0]:Skipping sample id=2717689. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2487584. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2728067. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2748015. Maximum sequence length: 2049, sample length: 4908 [default0]:Skipping sample id=2723421. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2480694. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2745500. Maximum sequence length: 2049, sample length: 4225 [default0]:Skipping sample id=2716593. Maximum sequence length: 2049, sample length: 2896 [default0]:Skipping sample id=2468015. Maximum sequence length: 2049, sample length: 2839 [default0]:Skipping sample id=2737959. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2714294. Maximum sequence length: 2049, sample length: 4329 [default0]:Skipping sample id=2729906. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2718354. Maximum sequence length: 2049, sample length: 4958 [default0]:Skipping sample id=2756115. Maximum sequence length: 2049, sample length: 2489 [default0]:Skipping sample id=2713836. Maximum sequence length: 2049, sample length: 2435 [default0]:Skipping sample id=2725561. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2752030. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2714520. Maximum sequence length: 2049, sample length: 2556 [default0]:Skipping sample id=2715343. Maximum sequence length: 2049, sample length: 3355 [default0]:Skipping sample id=2729968. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2727756. Maximum sequence length: 2049, sample length: 3201 [default0]:Skipping sample id=2736554. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2722119. Maximum sequence length: 2049, sample length: 3167 [default0]:Skipping sample id=2711114. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2754130. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2480084. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2747574. Maximum sequence length: 2049, sample length: 2949 [default0]:Skipping sample id=2754761. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2737971. Maximum sequence length: 2049, sample length: 3128 [default0]:Skipping sample id=2487219. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2729910. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2732747. Maximum sequence length: 2049, sample length: 5507 [default0]:Skipping sample id=2736488. Maximum sequence length: 2049, sample length: 2753 [default0]:Skipping sample id=2713680. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2744937. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2749946. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2748179. Maximum sequence length: 2049, sample length: 2788 [default0]:Skipping sample id=2749834. Maximum sequence length: 2049, sample length: 3472 [default0]:Skipping sample id=2744715. Maximum sequence length: 2049, sample length: 3524 [default0]:Skipping sample id=2755484. Maximum sequence length: 2049, sample length: 6423 [default0]:Skipping sample id=2740208. Maximum sequence length: 2049, sample length: 3047 [default0]:Skipping sample id=2726272. Maximum sequence length: 2049, sample length: 3953 [default0]:Skipping sample id=2715626. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2718922. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2750685. Maximum sequence length: 2049, sample length: 3513 [default0]:Skipping sample id=2730978. Maximum sequence length: 2049, sample length: 4320 [default0]:Skipping sample id=2741665. Maximum sequence length: 2049, sample length: 4231 [default0]:Skipping sample id=2753768. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2716376. Maximum sequence length: 2049, sample length: 4827 [default0]:Skipping sample id=2478669. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2747720. Maximum sequence length: 2049, sample length: 3752 [default0]:Skipping sample id=2478609. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2724646. Maximum sequence length: 2049, sample length: 3158 [default0]:Skipping sample id=2747747. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2716624. Maximum sequence length: 2049, sample length: 4104 [default0]:Skipping sample id=2725310. Maximum sequence length: 2049, sample length: 3605 [default0]:Skipping sample id=2748101. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2485062. Maximum sequence length: 2049, sample length: 2083 [default0]:Skipping sample id=2755925. Maximum sequence length: 2049, sample length: 6451 [default0]:Skipping sample id=2730556. Maximum sequence length: 2049, sample length: 3612 [default0]:Skipping sample id=2721675. Maximum sequence length: 2049, sample length: 2495 [default0]:Skipping sample id=2486722. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2751817. Maximum sequence length: 2049, sample length: 3634 [default0]:Skipping sample id=2744791. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2742266. Maximum sequence length: 2049, sample length: 3959 [default0]:Skipping sample id=2496734. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2467368. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2754118. Maximum sequence length: 2049, sample length: 3537 [default0]:Skipping sample id=2485434. Maximum sequence length: 2049, sample length: 3420 [default0]:Skipping sample id=2741534. Maximum sequence length: 2049, sample length: 3747 [default0]:Skipping sample id=2743108. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2713875. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2753823. Maximum sequence length: 2049, sample length: 2659 [default0]:Skipping sample id=2496258. Maximum sequence length: 2049, sample length: 2829 [default0]:Skipping sample id=2711803. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2738449. Maximum sequence length: 2049, sample length: 3134 [default0]:Skipping sample id=2713200. Maximum sequence length: 2049, sample length: 3065 [default0]:Skipping sample id=2724236. Maximum sequence length: 2049, sample length: 5430 [default0]:Skipping sample id=2712726. Maximum sequence length: 2049, sample length: 6228 [default0]:Skipping sample id=2745410. Maximum sequence length: 2049, sample length: 3673 [default0]:Skipping sample id=2749138. Maximum sequence length: 2049, sample length: 4193 [default0]:Skipping sample id=2740522. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2743678. Maximum sequence length: 2049, sample length: 3952 [default0]:Skipping sample id=2719547. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2495123. Maximum sequence length: 2049, sample length: 3591 [default0]:Skipping sample id=2746957. Maximum sequence length: 2049, sample length: 3082 [default0]:Skipping sample id=2722005. Maximum sequence length: 2049, sample length: 2740 [default0]:Skipping sample id=2719903. Maximum sequence length: 2049, sample length: 3428 [default0]:Skipping sample id=2711227. Maximum sequence length: 2049, sample length: 3840 [default0]:Skipping sample id=2493778. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2736658. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2736510. Maximum sequence length: 2049, sample length: 3567 [default0]:Skipping sample id=2727777. Maximum sequence length: 2049, sample length: 2677 [default0]:Skipping sample id=2494855. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2713826. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2718627. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2756934. Maximum sequence length: 2049, sample length: 3339 [default0]:Skipping sample id=2752703. Maximum sequence length: 2049, sample length: 2949 [default0]:Skipping sample id=2734219. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2716452. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2731519. Maximum sequence length: 2049, sample length: 3793 [default0]:Skipping sample id=2722721. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2726247. Maximum sequence length: 2049, sample length: 2898 [default0]:Skipping sample id=2754024. Maximum sequence length: 2049, sample length: 5151 [default0]:Skipping sample id=2729981. Maximum sequence length: 2049, sample length: 3109 [default0]:Skipping sample id=2726292. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2715158. Maximum sequence length: 2049, sample length: 5984 [default0]:Skipping sample id=2741564. Maximum sequence length: 2049, sample length: 2863 [default0]:Skipping sample id=2754421. Maximum sequence length: 2049, sample length: 3384 [default0]:Skipping sample id=2478742. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2747461. Maximum sequence length: 2049, sample length: 4547 [default0]:Skipping sample id=2712239. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2743264. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2744180. Maximum sequence length: 2049, sample length: 2844 [default0]:Skipping sample id=2756668. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2716786. Maximum sequence length: 2049, sample length: 2886 [default0]:Skipping sample id=2728802. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2744598. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2484816. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2749128. Maximum sequence length: 2049, sample length: 4795 [default0]:Skipping sample id=2750879. Maximum sequence length: 2049, sample length: 4474 [default0]:Skipping sample id=2743949. Maximum sequence length: 2049, sample length: 5536 [default0]:Skipping sample id=2731189. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2732054. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2748082. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2731174. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2731364. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2728529. Maximum sequence length: 2049, sample length: 3380 [default0]:Skipping sample id=2481190. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2748333. Maximum sequence length: 2049, sample length: 2818 [default0]:Skipping sample id=2745484. Maximum sequence length: 2049, sample length: 5312 [default0]:Skipping sample id=2741257. Maximum sequence length: 2049, sample length: 2865 [default0]:Skipping sample id=2752072. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2741432. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2716482. Maximum sequence length: 2049, sample length: 3868 [default0]:Skipping sample id=2749085. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2755049. Maximum sequence length: 2049, sample length: 3222 [default0]:Skipping sample id=2757065. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2751145. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2745756. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2751605. Maximum sequence length: 2049, sample length: 3373 [default0]:Skipping sample id=2721028. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2740226. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2736730. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2727012. Maximum sequence length: 2049, sample length: 6050 [default0]:Skipping sample id=2753049. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2724043. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2755152. Maximum sequence length: 2049, sample length: 3244 [default0]:Skipping sample id=2728318. Maximum sequence length: 2049, sample length: 2783 [default0]:Skipping sample id=2722587. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2728113. Maximum sequence length: 2049, sample length: 6218 [default0]:Skipping sample id=2468207. Maximum sequence length: 2049, sample length: 4327 [default0]:Skipping sample id=2720020. Maximum sequence length: 2049, sample length: 4779 [default0]:Skipping sample id=2481893. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2746950. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2735774. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2745426. Maximum sequence length: 2049, sample length: 2142 [default0]:Skipping sample id=2734264. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2745438. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2742617. Maximum sequence length: 2049, sample length: 3686 [default0]:Skipping sample id=2711848. Maximum sequence length: 2049, sample length: 4381 [default0]:Skipping sample id=2481298. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2721124. Maximum sequence length: 2049, sample length: 3273 [default0]:Skipping sample id=2741279. Maximum sequence length: 2049, sample length: 4120 [default0]:Skipping sample id=2712403. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2731772. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2738666. Maximum sequence length: 2049, sample length: 6215 [default0]:Skipping sample id=2729476. Maximum sequence length: 2049, sample length: 5807 [default0]:Skipping sample id=2740252. Maximum sequence length: 2049, sample length: 4731 [default0]:Skipping sample id=2755552. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2467096. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2486920. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2729400. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2711147. Maximum sequence length: 2049, sample length: 3931 [default0]:Skipping sample id=2746381. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2753320. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2729873. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2483762. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2485727. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2713119. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2497371. Maximum sequence length: 2049, sample length: 2430 [default0]:Skipping sample id=2465724. Maximum sequence length: 2049, sample length: 2565 [default0]:Skipping sample id=2713417. Maximum sequence length: 2049, sample length: 2903 [default0]:Skipping sample id=2736929. Maximum sequence length: 2049, sample length: 4907 [default0]:Skipping sample id=2736036. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2478924. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2732076. Maximum sequence length: 2049, sample length: 4958 [default0]:Skipping sample id=2745561. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2726617. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2494357. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2718257. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2715244. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2734687. Maximum sequence length: 2049, sample length: 2208 [default0]:Skipping sample id=2481918. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2478339. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2736395. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2723577. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2733712. Maximum sequence length: 2049, sample length: 2665 [default0]:Skipping sample id=2729781. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2723474. Maximum sequence length: 2049, sample length: 3475 [default0]:Skipping sample id=2715080. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2741547. Maximum sequence length: 2049, sample length: 3332 [default0]:Skipping sample id=2748510. Maximum sequence length: 2049, sample length: 2602 [default0]:Skipping sample id=2725623. Maximum sequence length: 2049, sample length: 2410 [default0]:Skipping sample id=2745070. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2749692. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2756069. Maximum sequence length: 2049, sample length: 2864 [default0]:Skipping sample id=2722833. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2732499. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2724104. Maximum sequence length: 2049, sample length: 3128 [default0]:Skipping sample id=2736437. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2747707. Maximum sequence length: 2049, sample length: 4043 [default0]:Skipping sample id=2738576. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2747539. Maximum sequence length: 2049, sample length: 3381 [default0]:Skipping sample id=2733545. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2733159. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2746258. Maximum sequence length: 2049, sample length: 5265 [default0]:Skipping sample id=2720803. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2721118. Maximum sequence length: 2049, sample length: 2891 [default0]:Skipping sample id=2751673. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2488986. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2756711. Maximum sequence length: 2049, sample length: 2986 [default0]:Skipping sample id=2720539. Maximum sequence length: 2049, sample length: 2968 [default0]:Skipping sample id=2713878. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2738782. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2737875. Maximum sequence length: 2049, sample length: 2639 [default0]:Skipping sample id=2726323. Maximum sequence length: 2049, sample length: 5995 [default0]:Skipping sample id=2726837. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2745706. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2495109. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2757014. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2753508. Maximum sequence length: 2049, sample length: 4860 [default0]:Skipping sample id=2721333. Maximum sequence length: 2049, sample length: 4789 [default0]:Skipping sample id=2713506. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2489186. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2713201. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2495651. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2742456. Maximum sequence length: 2049, sample length: 3465 [default0]:Skipping sample id=2738503. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2736596. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2711372. Maximum sequence length: 2049, sample length: 4903 [default0]:Skipping sample id=2743497. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2490168. Maximum sequence length: 2049, sample length: 3402 [default0]:Skipping sample id=2715201. Maximum sequence length: 2049, sample length: 4184 [default0]:Skipping sample id=2711829. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2748050. Maximum sequence length: 2049, sample length: 3435 [default0]:Skipping sample id=2724271. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2716259. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2724065. Maximum sequence length: 2049, sample length: 4301 [default0]:Skipping sample id=2712288. Maximum sequence length: 2049, sample length: 3034 [default0]:Skipping sample id=2735126. Maximum sequence length: 2049, sample length: 3726 [default0]:Skipping sample id=2721222. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2714963. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2482626. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2728982. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2743050. Maximum sequence length: 2049, sample length: 2781 [default0]:Skipping sample id=2713022. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2727332. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2725809. Maximum sequence length: 2049, sample length: 2794 [default0]:Skipping sample id=2734457. Maximum sequence length: 2049, sample length: 3716 [default0]:Skipping sample id=2750302. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2727747. Maximum sequence length: 2049, sample length: 4967 [default0]:Skipping sample id=2741768. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2736125. Maximum sequence length: 2049, sample length: 3902 [default0]:Skipping sample id=2492438. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2746613. Maximum sequence length: 2049, sample length: 2429 [default0]:Skipping sample id=2731034. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2718635. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2484711. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2735476. Maximum sequence length: 2049, sample length: 3624 [default0]:Skipping sample id=2756691. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2723699. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2741475. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2483874. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2739824. Maximum sequence length: 2049, sample length: 3356 [default0]:Skipping sample id=2733776. Maximum sequence length: 2049, sample length: 3571 [default0]:Skipping sample id=2712991. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2724517. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2495466. Maximum sequence length: 2049, sample length: 3618 [default0]:Skipping sample id=2718304. Maximum sequence length: 2049, sample length: 5675 [default0]:Skipping sample id=2733697. Maximum sequence length: 2049, sample length: 4406 [default0]:Skipping sample id=2751909. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2731800. Maximum sequence length: 2049, sample length: 5096 [default0]:Skipping sample id=2496462. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2754816. Maximum sequence length: 2049, sample length: 2179 [default0]:Skipping sample id=2718626. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2746746. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2747819. Maximum sequence length: 2049, sample length: 3010 [default0]:Skipping sample id=2720779. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2485594. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2723102. Maximum sequence length: 2049, sample length: 3015 [default0]:Skipping sample id=2753352. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2485490. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2712381. Maximum sequence length: 2049, sample length: 5332 [default0]:Skipping sample id=2744805. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2711498. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2732477. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2740065. Maximum sequence length: 2049, sample length: 2822 [default0]:Skipping sample id=2742068. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2756599. Maximum sequence length: 2049, sample length: 2965 [default0]:Skipping sample id=2490329. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2749061. Maximum sequence length: 2049, sample length: 2860 [default0]:Skipping sample id=2750414. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2747691. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2720523. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2711419. Maximum sequence length: 2049, sample length: 6684 [default0]:Skipping sample id=2751672. Maximum sequence length: 2049, sample length: 4104 [default0]:Skipping sample id=2736841. Maximum sequence length: 2049, sample length: 2562 [default0]:Skipping sample id=2711362. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2492188. Maximum sequence length: 2049, sample length: 2227 [default0]:Skipping sample id=2751952. Maximum sequence length: 2049, sample length: 2854 [default0]:Skipping sample id=2717276. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2735414. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2752598. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2724952. Maximum sequence length: 2049, sample length: 3310 [default0]:Skipping sample id=2734074. Maximum sequence length: 2049, sample length: 2944 [default0]:Skipping sample id=2495102. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2743693. Maximum sequence length: 2049, sample length: 3642 [default0]:Skipping sample id=2755185. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2725336. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2754973. Maximum sequence length: 2049, sample length: 3529 [default0]:Skipping sample id=2735812. Maximum sequence length: 2049, sample length: 2966 [default0]:Skipping sample id=2748661. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2470300. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2485637. Maximum sequence length: 2049, sample length: 2584 [default0]:Skipping sample id=2751879. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2467196. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2728176. Maximum sequence length: 2049, sample length: 4075 [default0]:Skipping sample id=2740255. Maximum sequence length: 2049, sample length: 3536 [default0]:Skipping sample id=2726668. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2723916. Maximum sequence length: 2049, sample length: 3504 [default0]:Skipping sample id=2726002. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2738035. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2493052. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2748398. Maximum sequence length: 2049, sample length: 3336 [default0]:Skipping sample id=2742011. Maximum sequence length: 2049, sample length: 3153 [default0]:Skipping sample id=2751263. Maximum sequence length: 2049, sample length: 2591 [default0]:Skipping sample id=2494740. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2756562. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2725651. Maximum sequence length: 2049, sample length: 2841 [default0]:Skipping sample id=2488256. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2743237. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2753459. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2756621. Maximum sequence length: 2049, sample length: 4530 [default0]:Skipping sample id=2752452. Maximum sequence length: 2049, sample length: 3313 [default0]:Skipping sample id=2481837. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2736348. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2721525. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2737010. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2468759. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2741620. Maximum sequence length: 2049, sample length: 2587 [default0]:Skipping sample id=2731499. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2740554. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2725667. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2714298. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2756068. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2714090. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2741231. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2734283. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2490892. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2730319. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2733717. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2732011. Maximum sequence length: 2049, sample length: 4093 [default0]:Skipping sample id=2716876. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2488589. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2711294. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2728452. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2719386. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2756374. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2732478. Maximum sequence length: 2049, sample length: 3408 [default0]:Skipping sample id=2729995. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2714131. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2756766. Maximum sequence length: 2049, sample length: 3717 [default0]:Skipping sample id=2713377. Maximum sequence length: 2049, sample length: 4000 [default0]:Skipping sample id=2720993. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2727200. Maximum sequence length: 2049, sample length: 7273 [default0]:Skipping sample id=2750761. Maximum sequence length: 2049, sample length: 2925 [default0]:Skipping sample id=2733413. Maximum sequence length: 2049, sample length: 4238 [default0]:Skipping sample id=2743744. Maximum sequence length: 2049, sample length: 4820 [default0]:Skipping sample id=2754521. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2744733. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2755103. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2719826. Maximum sequence length: 2049, sample length: 4570 [default0]:Skipping sample id=2749346. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2725346. Maximum sequence length: 2049, sample length: 2480 [default0]:Skipping sample id=2741769. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2748516. Maximum sequence length: 2049, sample length: 4349 [default0]:Skipping sample id=2480462. Maximum sequence length: 2049, sample length: 2455 [default0]:Skipping sample id=2735183. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2750389. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2711495. Maximum sequence length: 2049, sample length: 4104 [default0]:Skipping sample id=2755879. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2715979. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2490591. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2750323. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2715864. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2492521. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2714317. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2469857. Maximum sequence length: 2049, sample length: 2457 [default0]:Skipping sample id=2726583. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2743299. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2477555. Maximum sequence length: 2049, sample length: 2105 [default0]:Skipping sample id=2726886. Maximum sequence length: 2049, sample length: 3283 [default0]:Skipping sample id=2477712. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2723113. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2734063. Maximum sequence length: 2049, sample length: 3351 [default0]:Skipping sample id=2723177. Maximum sequence length: 2049, sample length: 3837 [default0]:Skipping sample id=2742033. Maximum sequence length: 2049, sample length: 4357 [default0]:Skipping sample id=2747560. Maximum sequence length: 2049, sample length: 2828 [default0]:Skipping sample id=2754860. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2481699. Maximum sequence length: 2049, sample length: 2759 [default0]:Skipping sample id=2746953. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2720542. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2719954. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2732064. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2735155. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2744581. Maximum sequence length: 2049, sample length: 4062 [default0]:Skipping sample id=2724347. Maximum sequence length: 2049, sample length: 5824 [default0]:Skipping sample id=2720332. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2469261. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2729090. Maximum sequence length: 2049, sample length: 2298 [default0]:Skipping sample id=2734294. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2749026. Maximum sequence length: 2049, sample length: 3303 [default0]:Skipping sample id=2724888. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2724348. Maximum sequence length: 2049, sample length: 3561 [default0]:Skipping sample id=2717181. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2737316. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2469422. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2749704. Maximum sequence length: 2049, sample length: 3323 [default0]:Skipping sample id=2725929. Maximum sequence length: 2049, sample length: 3328 [default0]:Skipping sample id=2719905. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2721286. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2498694. Maximum sequence length: 2049, sample length: 2597 [default0]:Skipping sample id=2755145. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2733584. Maximum sequence length: 2049, sample length: 3980 [default0]:Skipping sample id=2734535. Maximum sequence length: 2049, sample length: 3960 [default0]:Skipping sample id=2713049. Maximum sequence length: 2049, sample length: 3942 [default0]:Skipping sample id=2491696. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2753275. Maximum sequence length: 2049, sample length: 2430 [default0]:Skipping sample id=2467000. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2741560. Maximum sequence length: 2049, sample length: 3826 [default0]:Skipping sample id=2738364. Maximum sequence length: 2049, sample length: 2649 [default0]:Skipping sample id=2740071. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2744304. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2743120. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2724329. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2711236. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2726261. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2754572. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2493054. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2744982. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2731601. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2734486. Maximum sequence length: 2049, sample length: 2487 [default0]:Skipping sample id=2720908. Maximum sequence length: 2049, sample length: 3306 [default0]:Skipping sample id=2728315. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2748304. Maximum sequence length: 2049, sample length: 2404 [default0]:Skipping sample id=2470585. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2741117. Maximum sequence length: 2049, sample length: 4330 [default0]:Skipping sample id=2739348. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2747229. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2757117. Maximum sequence length: 2049, sample length: 3106 [default0]:Skipping sample id=2745435. Maximum sequence length: 2049, sample length: 2098 [default0]:Skipping sample id=2751831. Maximum sequence length: 2049, sample length: 4392 [default0]:Skipping sample id=2730733. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2721483. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2757025. Maximum sequence length: 2049, sample length: 4126 [default0]:Skipping sample id=2751121. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2727419. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2466469. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2729167. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2738912. Maximum sequence length: 2049, sample length: 4771 [default0]:Skipping sample id=2742628. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2732752. Maximum sequence length: 2049, sample length: 3588 [default0]:Skipping sample id=2732333. Maximum sequence length: 2049, sample length: 3899 [default0]:Skipping sample id=2725450. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2722316. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2720471. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2723217. Maximum sequence length: 2049, sample length: 3752 [default0]:Skipping sample id=2727328. Maximum sequence length: 2049, sample length: 5310 [default0]:Skipping sample id=2739863. Maximum sequence length: 2049, sample length: 2621 [default0]:Skipping sample id=2493430. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2715429. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2719358. Maximum sequence length: 2049, sample length: 4013 [default0]:Skipping sample id=2736111. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2742202. Maximum sequence length: 2049, sample length: 3094 [default0]:Skipping sample id=2745648. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2747152. Maximum sequence length: 2049, sample length: 5004 [default0]:Skipping sample id=2732759. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2724875. Maximum sequence length: 2049, sample length: 6621 [default0]:Skipping sample id=2717115. Maximum sequence length: 2049, sample length: 5181 [default0]:Skipping sample id=2728184. Maximum sequence length: 2049, sample length: 4897 [default0]:Skipping sample id=2741028. Maximum sequence length: 2049, sample length: 3409 [default0]:Skipping sample id=2745346. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2732230. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2755989. Maximum sequence length: 2049, sample length: 3709 [default0]:Skipping sample id=2714414. Maximum sequence length: 2049, sample length: 2797 [default0]:Skipping sample id=2725275. Maximum sequence length: 2049, sample length: 2792 [default0]:Skipping sample id=2720111. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2490176. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2720899. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2715070. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2739567. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2711811. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2479596. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2751652. Maximum sequence length: 2049, sample length: 3257 [default0]:Skipping sample id=2716567. Maximum sequence length: 2049, sample length: 3382 [default0]:Skipping sample id=2752388. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2746852. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2733355. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2725422. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2745449. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2719961. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2727612. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2722572. Maximum sequence length: 2049, sample length: 2818 [default0]:Skipping sample id=2741983. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2750899. Maximum sequence length: 2049, sample length: 3087 [default0]:Skipping sample id=2729044. Maximum sequence length: 2049, sample length: 2549 [default0]:Skipping sample id=2753718. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2743888. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2493776. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2732408. Maximum sequence length: 2049, sample length: 3089 [default0]:Skipping sample id=2734258. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2748668. Maximum sequence length: 2049, sample length: 3553 [default0]:Skipping sample id=2470354. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2478805. Maximum sequence length: 2049, sample length: 3233 [default0]:Skipping sample id=2477409. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2724182. Maximum sequence length: 2049, sample length: 3512 [default0]:Skipping sample id=2745312. Maximum sequence length: 2049, sample length: 4593 [default0]:Skipping sample id=2751305. Maximum sequence length: 2049, sample length: 3584 [default0]:Skipping sample id=2737171. Maximum sequence length: 2049, sample length: 4500 [default0]:Skipping sample id=2491259. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2717377. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2712042. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2499340. Maximum sequence length: 2049, sample length: 3123 [default0]:Skipping sample id=2744338. Maximum sequence length: 2049, sample length: 3971 [default0]:Skipping sample id=2738803. Maximum sequence length: 2049, sample length: 2852 [default0]:Skipping sample id=2722224. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2488390. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2751784. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2716744. Maximum sequence length: 2049, sample length: 4007 [default0]:Skipping sample id=2745975. Maximum sequence length: 2049, sample length: 3911 [default0]:Skipping sample id=2496404. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2736736. Maximum sequence length: 2049, sample length: 3162 [default0]:Skipping sample id=2741705. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2483119. Maximum sequence length: 2049, sample length: 3249 [default0]:Skipping sample id=2748890. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2716852. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2741715. Maximum sequence length: 2049, sample length: 3223 [default0]:Skipping sample id=2730062. Maximum sequence length: 2049, sample length: 2596 [default0]:Skipping sample id=2745926. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2755261. Maximum sequence length: 2049, sample length: 3965 [default0]:Skipping sample id=2738545. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2468355. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2728902. Maximum sequence length: 2049, sample length: 5248 [default0]:Skipping sample id=2739902. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2743209. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2723851. Maximum sequence length: 2049, sample length: 2754 [default0]:Skipping sample id=2719879. Maximum sequence length: 2049, sample length: 3741 [default0]:Skipping sample id=2730242. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2494081. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2718475. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2719830. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2740014. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2740419. Maximum sequence length: 2049, sample length: 3312 [default0]:Skipping sample id=2716938. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2740985. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2751891. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2721180. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2467847. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2756295. Maximum sequence length: 2049, sample length: 3523 [default0]:Skipping sample id=2714590. Maximum sequence length: 2049, sample length: 3450 [default0]:Skipping sample id=2714639. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2744320. Maximum sequence length: 2049, sample length: 2843 [default0]:Skipping sample id=2718932. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2722305. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2728576. Maximum sequence length: 2049, sample length: 6481 [default0]:Skipping sample id=2732584. Maximum sequence length: 2049, sample length: 4038 [default0]:Skipping sample id=2744610. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2747050. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2726434. Maximum sequence length: 2049, sample length: 2839 [default0]:Skipping sample id=2469480. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2711931. Maximum sequence length: 2049, sample length: 3757 [default0]:Skipping sample id=2729126. Maximum sequence length: 2049, sample length: 4117 [default0]:Skipping sample id=2718862. Maximum sequence length: 2049, sample length: 3415 [default0]:Skipping sample id=2470318. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2737556. Maximum sequence length: 2049, sample length: 4606 [default0]:Skipping sample id=2478116. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2750701. Maximum sequence length: 2049, sample length: 5064 [default0]:Skipping sample id=2732255. Maximum sequence length: 2049, sample length: 3420 [default0]:Skipping sample id=2732952. Maximum sequence length: 2049, sample length: 4538 [default0]:Skipping sample id=2750365. Maximum sequence length: 2049, sample length: 5056 [default0]:Skipping sample id=2742504. Maximum sequence length: 2049, sample length: 5080 [default0]:Skipping sample id=2749590. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2489736. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2752292. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2727900. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2725474. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2468862. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2746543. Maximum sequence length: 2049, sample length: 4097 [default0]:Skipping sample id=2719179. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2470848. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2756348. Maximum sequence length: 2049, sample length: 4008 [default0]:Skipping sample id=2735512. Maximum sequence length: 2049, sample length: 3195 [default0]:Skipping sample id=2717987. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2722415. Maximum sequence length: 2049, sample length: 3882 [default0]:Skipping sample id=2493109. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2711625. Maximum sequence length: 2049, sample length: 6060 [default0]:Skipping sample id=2733658. Maximum sequence length: 2049, sample length: 3380 [default0]:Skipping sample id=2755622. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2482938. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2741496. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2467937. Maximum sequence length: 2049, sample length: 3016 [default0]:Skipping sample id=2469994. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2742967. Maximum sequence length: 2049, sample length: 2755 [default0]:Skipping sample id=2741544. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2742719. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2742970. Maximum sequence length: 2049, sample length: 2975 [default0]:Skipping sample id=2727310. Maximum sequence length: 2049, sample length: 4089 [default0]:Skipping sample id=2719122. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2744557. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2746520. Maximum sequence length: 2049, sample length: 2974 [default0]:Skipping sample id=2487881. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2717587. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2734814. Maximum sequence length: 2049, sample length: 2526 [default0]:Skipping sample id=2489744. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2753568. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2732801. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2754565. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2747449. Maximum sequence length: 2049, sample length: 3371 [default0]:Skipping sample id=2710975. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2723663. Maximum sequence length: 2049, sample length: 3578 [default0]:Skipping sample id=2715331. Maximum sequence length: 2049, sample length: 3554 [default0]:Skipping sample id=2488983. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2739609. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2724198. Maximum sequence length: 2049, sample length: 2590 [default0]:Skipping sample id=2731936. Maximum sequence length: 2049, sample length: 3406 [default0]:Skipping sample id=2714523. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2466346. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2722991. Maximum sequence length: 2049, sample length: 3192 [default0]:Skipping sample id=2743017. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2731419. Maximum sequence length: 2049, sample length: 4124 [default0]:Skipping sample id=2478086. Maximum sequence length: 2049, sample length: 3890 [default0]:Skipping sample id=2493604. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2735842. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2719753. Maximum sequence length: 2049, sample length: 4324 [default0]:Skipping sample id=2716767. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2479724. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2721206. Maximum sequence length: 2049, sample length: 2748 [default0]:Skipping sample id=2741057. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2743780. Maximum sequence length: 2049, sample length: 4909 [default0]:Skipping sample id=2734669. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2744966. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2732939. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2746617. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2749436. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2724175. Maximum sequence length: 2049, sample length: 4006 [default0]:Skipping sample id=2743177. Maximum sequence length: 2049, sample length: 2471 [default0]:Skipping sample id=2716213. Maximum sequence length: 2049, sample length: 4069 [default0]:Skipping sample id=2737995. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2712736. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2755213. Maximum sequence length: 2049, sample length: 5864 [default0]:Skipping sample id=2729379. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2738635. Maximum sequence length: 2049, sample length: 2962 [default0]:Skipping sample id=2750513. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2716422. Maximum sequence length: 2049, sample length: 2249 [default0]:Skipping sample id=2482522. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2751082. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2733657. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2749800. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2466667. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2711835. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2745474. Maximum sequence length: 2049, sample length: 2540 [default0]:Skipping sample id=2724059. Maximum sequence length: 2049, sample length: 3413 [default0]:Skipping sample id=2494578. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2736384. Maximum sequence length: 2049, sample length: 3088 [default0]:Skipping sample id=2735158. Maximum sequence length: 2049, sample length: 3845 [default0]:Skipping sample id=2744878. Maximum sequence length: 2049, sample length: 4467 [default0]:Skipping sample id=2747452. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2742286. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2481865. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2750441. Maximum sequence length: 2049, sample length: 4600 [default0]:Skipping sample id=2713733. Maximum sequence length: 2049, sample length: 3415 [default0]:Skipping sample id=2466246. Maximum sequence length: 2049, sample length: 2211 [default0]:Skipping sample id=2740287. Maximum sequence length: 2049, sample length: 3626 [default0]:Skipping sample id=2469180. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2733307. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2483216. Maximum sequence length: 2049, sample length: 2900 [default0]:Skipping sample id=2718199. Maximum sequence length: 2049, sample length: 3364 [default0]:Skipping sample id=2735290. Maximum sequence length: 2049, sample length: 3418 [default0]:Skipping sample id=2718330. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2741047. Maximum sequence length: 2049, sample length: 3577 [default0]:Skipping sample id=2717182. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2733963. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2731302. Maximum sequence length: 2049, sample length: 4063 [default0]:Skipping sample id=2718966. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2724807. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2486139. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2748490. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2468159. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2484466. Maximum sequence length: 2049, sample length: 2741 [default0]:Skipping sample id=2749013. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2717449. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2744688. Maximum sequence length: 2049, sample length: 2530 [default0]:Skipping sample id=2734279. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2732108. Maximum sequence length: 2049, sample length: 2328 [default0]:Skipping sample id=2715473. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2737152. Maximum sequence length: 2049, sample length: 4596 [default0]:Skipping sample id=2740762. Maximum sequence length: 2049, sample length: 2690 [default0]:Skipping sample id=2732519. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2749894. Maximum sequence length: 2049, sample length: 3356 [default0]:Skipping sample id=2711219. Maximum sequence length: 2049, sample length: 2731 [default0]:Skipping sample id=2730781. Maximum sequence length: 2049, sample length: 2405 [default0]:Skipping sample id=2737270. Maximum sequence length: 2049, sample length: 5841 [default0]:Skipping sample id=2716831. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2735957. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2721008. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2748061. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2722753. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2720494. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2714005. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2714031. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2743327. Maximum sequence length: 2049, sample length: 2365 [default0]:Skipping sample id=2752032. Maximum sequence length: 2049, sample length: 2656 [default0]:Skipping sample id=2714229. Maximum sequence length: 2049, sample length: 4013 [default0]:Skipping sample id=2714919. Maximum sequence length: 2049, sample length: 3988 [default0]:Skipping sample id=2486616. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2711379. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2718953. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2711752. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2733069. Maximum sequence length: 2049, sample length: 2486 [default0]:Skipping sample id=2720997. Maximum sequence length: 2049, sample length: 3098 [default0]:Skipping sample id=2740737. Maximum sequence length: 2049, sample length: 6141 [default0]:Skipping sample id=2719496. Maximum sequence length: 2049, sample length: 2714 [default0]:Skipping sample id=2725008. Maximum sequence length: 2049, sample length: 3748 [default0]:Skipping sample id=2750176. Maximum sequence length: 2049, sample length: 3041 [default0]:Skipping sample id=2486656. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2720189. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2739509. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2740715. Maximum sequence length: 2049, sample length: 2426 [default0]:Skipping sample id=2478725. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2729088. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2712766. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2753947. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2742298. Maximum sequence length: 2049, sample length: 2720 [default0]:Skipping sample id=2735866. Maximum sequence length: 2049, sample length: 4522 [default0]:Skipping sample id=2743145. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2751818. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2467608. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2477483. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2736959. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2487664. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2737083. Maximum sequence length: 2049, sample length: 3348 [default0]:Skipping sample id=2733065. Maximum sequence length: 2049, sample length: 4754 [default0]:Skipping sample id=2712222. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2720472. Maximum sequence length: 2049, sample length: 3355 [default0]:Skipping sample id=2737417. Maximum sequence length: 2049, sample length: 3746 [default0]:Skipping sample id=2746247. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2718783. Maximum sequence length: 2049, sample length: 2448 [default0]:Skipping sample id=2747133. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2739115. Maximum sequence length: 2049, sample length: 6054 [default0]:Skipping sample id=2726786. Maximum sequence length: 2049, sample length: 3276 [default0]:Skipping sample id=2731066. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2482584. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2723433. Maximum sequence length: 2049, sample length: 5682 [default0]:Skipping sample id=2724720. Maximum sequence length: 2049, sample length: 5172 [default0]:Skipping sample id=2720874. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2717009. Maximum sequence length: 2049, sample length: 3698 [default0]:Skipping sample id=2477778. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2720878. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2752026. Maximum sequence length: 2049, sample length: 3186 [default0]:Skipping sample id=2732792. Maximum sequence length: 2049, sample length: 2951 [default0]:Skipping sample id=2480333. Maximum sequence length: 2049, sample length: 3318 [default0]:Skipping sample id=2747763. Maximum sequence length: 2049, sample length: 4938 [default0]:Skipping sample id=2479952. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2734381. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2728568. Maximum sequence length: 2049, sample length: 4406 [default0]:Skipping sample id=2735216. Maximum sequence length: 2049, sample length: 2599 [default0]:Skipping sample id=2713341. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2468625. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2742281. Maximum sequence length: 2049, sample length: 4596 [default0]:Skipping sample id=2731155. Maximum sequence length: 2049, sample length: 2538 [default0]:Skipping sample id=2723276. Maximum sequence length: 2049, sample length: 2932 [default0]:Skipping sample id=2739430. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2749263. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2745466. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2715702. Maximum sequence length: 2049, sample length: 3559 [default0]:Skipping sample id=2755846. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2745310. Maximum sequence length: 2049, sample length: 3307 [default0]:Skipping sample id=2498103. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2486126. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2478657. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2720737. Maximum sequence length: 2049, sample length: 3560 [default0]:Skipping sample id=2751798. Maximum sequence length: 2049, sample length: 3110 [default0]:Skipping sample id=2716602. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2484071. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2720373. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2729508. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2734617. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2730135. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2732914. Maximum sequence length: 2049, sample length: 3253 [default0]:Skipping sample id=2723559. Maximum sequence length: 2049, sample length: 2941 [default0]:Skipping sample id=2727988. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2732177. Maximum sequence length: 2049, sample length: 3268 [default0]:Skipping sample id=2715691. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2744321. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2729866. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2467330. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2736459. Maximum sequence length: 2049, sample length: 6514 [default0]:Skipping sample id=2732667. Maximum sequence length: 2049, sample length: 5060 [default0]:Skipping sample id=2742137. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2726618. Maximum sequence length: 2049, sample length: 2791 [default0]:Skipping sample id=2735431. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2734449. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2748409. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2741402. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2727474. Maximum sequence length: 2049, sample length: 4322 [default0]:Skipping sample id=2482959. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2467567. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2743686. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2731730. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2722851. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2470459. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2729883. Maximum sequence length: 2049, sample length: 3368 [default0]:Skipping sample id=2722118. Maximum sequence length: 2049, sample length: 5142 [default0]:Skipping sample id=2748293. Maximum sequence length: 2049, sample length: 2513 [default0]:Skipping sample id=2722522. Maximum sequence length: 2049, sample length: 3956 [default0]:Skipping sample id=2747618. Maximum sequence length: 2049, sample length: 3117 [default0]:Skipping sample id=2738403. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2715644. Maximum sequence length: 2049, sample length: 3315 [default0]:Skipping sample id=2717349. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2751990. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2494173. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2743141. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2715145. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2724379. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2715442. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2754472. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2733991. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2711473. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2751654. Maximum sequence length: 2049, sample length: 3797 [default0]:Skipping sample id=2723466. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2726844. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2743593. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2712895. Maximum sequence length: 2049, sample length: 3060 [default0]:Skipping sample id=2716244. Maximum sequence length: 2049, sample length: 2382 [default0]:Skipping sample id=2733831. Maximum sequence length: 2049, sample length: 3154 [default0]:Skipping sample id=2724440. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2744691. Maximum sequence length: 2049, sample length: 3892 [default0]:Skipping sample id=2481561. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2731068. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2715263. Maximum sequence length: 2049, sample length: 3639 [default0]:Skipping sample id=2477892. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2494911. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2732343. Maximum sequence length: 2049, sample length: 3475 [default0]:Skipping sample id=2488154. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2720072. Maximum sequence length: 2049, sample length: 3508 [default0]:Skipping sample id=2734606. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2714231. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2483496. Maximum sequence length: 2049, sample length: 2337 [default0]:Skipping sample id=2746642. Maximum sequence length: 2049, sample length: 2974 [default0]:Skipping sample id=2737669. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2744306. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2715866. Maximum sequence length: 2049, sample length: 3212 [default0]:Skipping sample id=2719101. Maximum sequence length: 2049, sample length: 3782 [default0]:Skipping sample id=2712795. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2712990. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2481777. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2728215. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2731255. Maximum sequence length: 2049, sample length: 3088 [default0]:Skipping sample id=2740485. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2720990. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2738604. Maximum sequence length: 2049, sample length: 3595 [default0]:Skipping sample id=2715836. Maximum sequence length: 2049, sample length: 3656 [default0]:Skipping sample id=2727411. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2745267. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2715793. Maximum sequence length: 2049, sample length: 2827 [default0]:Skipping sample id=2733716. Maximum sequence length: 2049, sample length: 3055 [default0]:Skipping sample id=2736840. Maximum sequence length: 2049, sample length: 4345 [default0]:Skipping sample id=2722219. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2738853. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2723280. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2731285. Maximum sequence length: 2049, sample length: 4529 [default0]:Skipping sample id=2754596. Maximum sequence length: 2049, sample length: 3977 [default0]:Skipping sample id=2724500. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2729459. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2726506. Maximum sequence length: 2049, sample length: 2500 [default0]:Skipping sample id=2749017. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2717423. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2496149. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2725939. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2491183. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2729660. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2712248. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2748636. Maximum sequence length: 2049, sample length: 3048 [default0]:Skipping sample id=2732356. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2477012. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2726965. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2465984. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2466575. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2743917. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2745765. Maximum sequence length: 2049, sample length: 3077 [default0]:Skipping sample id=2466054. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2467156. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2735348. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2739886. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2734420. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2756077. Maximum sequence length: 2049, sample length: 4373 [default0]:Skipping sample id=2726342. Maximum sequence length: 2049, sample length: 2956 [default0]:Skipping sample id=2738047. Maximum sequence length: 2049, sample length: 2864 [default0]:Skipping sample id=2756027. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2746341. Maximum sequence length: 2049, sample length: 2960 [default0]:Skipping sample id=2477618. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2729568. Maximum sequence length: 2049, sample length: 5964 [default0]:Skipping sample id=2498568. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2482077. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2711792. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2749763. Maximum sequence length: 2049, sample length: 5111 [default0]:Skipping sample id=2723042. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2740733. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2716283. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2749058. Maximum sequence length: 2049, sample length: 6975 [default0]:Skipping sample id=2486316. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2732891. Maximum sequence length: 2049, sample length: 3986 [default0]:Skipping sample id=2730565. Maximum sequence length: 2049, sample length: 5980 [default0]:Skipping sample id=2725738. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2493491. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2495588. Maximum sequence length: 2049, sample length: 2237 [default0]:Skipping sample id=2717896. Maximum sequence length: 2049, sample length: 4158 [default0]:Skipping sample id=2495358. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2752277. Maximum sequence length: 2049, sample length: 2699 [default0]:Skipping sample id=2729898. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2720524. Maximum sequence length: 2049, sample length: 5542 [default0]:Skipping sample id=2737437. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2735608. Maximum sequence length: 2049, sample length: 6255 [default0]:Skipping sample id=2749667. Maximum sequence length: 2049, sample length: 5617 [default0]:Skipping sample id=2718712. Maximum sequence length: 2049, sample length: 3110 [default0]:Skipping sample id=2719011. Maximum sequence length: 2049, sample length: 4670 [default0]:Skipping sample id=2489496. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2493941. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2747518. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2751167. Maximum sequence length: 2049, sample length: 3058 [default0]:Skipping sample id=2715258. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2744374. Maximum sequence length: 2049, sample length: 4547 [default0]:Skipping sample id=2742857. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2725317. Maximum sequence length: 2049, sample length: 5094 [default0]:Skipping sample id=2734365. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2715601. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2728008. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2730981. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2746310. Maximum sequence length: 2049, sample length: 2687 [default0]:Skipping sample id=2714089. Maximum sequence length: 2049, sample length: 5817 [default0]:Skipping sample id=2495725. Maximum sequence length: 2049, sample length: 3304 [default0]:Skipping sample id=2713571. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2715377. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2743524. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2718024. Maximum sequence length: 2049, sample length: 3060 [default0]:Skipping sample id=2735300. Maximum sequence length: 2049, sample length: 2512 [default0]:Skipping sample id=2485372. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2718873. Maximum sequence length: 2049, sample length: 3325 [default0]:Skipping sample id=2726539. Maximum sequence length: 2049, sample length: 3352 [default0]:Skipping sample id=2730475. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2735387. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2748139. Maximum sequence length: 2049, sample length: 3506 [default0]:Skipping sample id=2719655. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2723069. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2724093. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2747993. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2746496. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2714219. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2721075. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2753468. Maximum sequence length: 2049, sample length: 4326 [default0]:Skipping sample id=2715508. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2493072. Maximum sequence length: 2049, sample length: 2593 [default0]:Skipping sample id=2751923. Maximum sequence length: 2049, sample length: 4239 [default0]:Skipping sample id=2713749. Maximum sequence length: 2049, sample length: 3591 [default0]:Skipping sample id=2466092. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2735147. Maximum sequence length: 2049, sample length: 4074 [default0]:Skipping sample id=2737742. Maximum sequence length: 2049, sample length: 3931 [default0]:Skipping sample id=2724396. Maximum sequence length: 2049, sample length: 2683 [default0]:Skipping sample id=2746943. Maximum sequence length: 2049, sample length: 2545 [default0]:Skipping sample id=2482787. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2729839. Maximum sequence length: 2049, sample length: 6628 [default0]:Skipping sample id=2745339. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2734697. Maximum sequence length: 2049, sample length: 5964 [default0]:Skipping sample id=2482371. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2753948. Maximum sequence length: 2049, sample length: 3089 [default0]:Skipping sample id=2753229. Maximum sequence length: 2049, sample length: 3932 [default0]:Skipping sample id=2483673. Maximum sequence length: 2049, sample length: 2569 [default0]:Skipping sample id=2722136. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2720231. Maximum sequence length: 2049, sample length: 4304 [default0]:Skipping sample id=2713194. Maximum sequence length: 2049, sample length: 2462 [default0]:Skipping sample id=2755412. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2746839. Maximum sequence length: 2049, sample length: 5866 [default0]:Skipping sample id=2711644. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2728001. Maximum sequence length: 2049, sample length: 2998 [default0]:Skipping sample id=2713594. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2734394. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2750388. Maximum sequence length: 2049, sample length: 4297 [default0]:Skipping sample id=2741215. Maximum sequence length: 2049, sample length: 3807 [default0]:Skipping sample id=2488495. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2747654. Maximum sequence length: 2049, sample length: 2101 [default0]:Skipping sample id=2732515. Maximum sequence length: 2049, sample length: 3560 [default0]:Skipping sample id=2727079. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2482887. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2716977. Maximum sequence length: 2049, sample length: 4381 [default0]:Skipping sample id=2753490. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2722009. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2752341. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2726986. Maximum sequence length: 2049, sample length: 2069 [default0]:Skipping sample id=2742272. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2751123. Maximum sequence length: 2049, sample length: 5868 [default0]:Skipping sample id=2722121. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2732942. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2481100. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2717781. Maximum sequence length: 2049, sample length: 4802 [default0]:Skipping sample id=2728862. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2477156. Maximum sequence length: 2049, sample length: 2618 [default0]:Skipping sample id=2749803. Maximum sequence length: 2049, sample length: 2113 [default0]:Skipping sample id=2745952. Maximum sequence length: 2049, sample length: 5751 [default0]:Skipping sample id=2746811. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2755099. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2718801. Maximum sequence length: 2049, sample length: 2545 [default0]:Skipping sample id=2731843. Maximum sequence length: 2049, sample length: 2492 [default0]:Skipping sample id=2749670. Maximum sequence length: 2049, sample length: 2695 [default0]:Skipping sample id=2756392. Maximum sequence length: 2049, sample length: 2589 [default0]:Skipping sample id=2755507. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2740130. Maximum sequence length: 2049, sample length: 4289 [default0]:Skipping sample id=2721973. Maximum sequence length: 2049, sample length: 3250 [default0]:Skipping sample id=2721109. Maximum sequence length: 2049, sample length: 2873 [default0]:Skipping sample id=2721453. Maximum sequence length: 2049, sample length: 4645 [default0]:Skipping sample id=2477942. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2727274. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2736300. Maximum sequence length: 2049, sample length: 4486 [default0]:Skipping sample id=2712012. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2711096. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2467293. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2733635. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2756023. Maximum sequence length: 2049, sample length: 3082 [default0]:Skipping sample id=2742431. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2491873. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2748353. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2499226. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2726230. Maximum sequence length: 2049, sample length: 2770 [default0]:Skipping sample id=2479495. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2489224. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2734100. Maximum sequence length: 2049, sample length: 3436 [default0]:Skipping sample id=2733742. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2712089. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2714960. Maximum sequence length: 2049, sample length: 3634 [default0]:Skipping sample id=2711887. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2736797. Maximum sequence length: 2049, sample length: 3099 [default0]:Skipping sample id=2739319. Maximum sequence length: 2049, sample length: 6630 [default0]:Skipping sample id=2723252. Maximum sequence length: 2049, sample length: 4726 [default0]:Skipping sample id=2711040. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2717835. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2721241. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2739594. Maximum sequence length: 2049, sample length: 5811 [default0]:Skipping sample id=2725502. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2497905. Maximum sequence length: 2049, sample length: 3114 [default0]:Skipping sample id=2749575. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2742083. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2715008. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2730479. Maximum sequence length: 2049, sample length: 6255 [default0]:Skipping sample id=2720026. Maximum sequence length: 2049, sample length: 4155 [default0]:Skipping sample id=2722825. Maximum sequence length: 2049, sample length: 3476 [default0]:Skipping sample id=2737820. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2730775. Maximum sequence length: 2049, sample length: 6421 [default0]:Skipping sample id=2483470. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2735893. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2719503. Maximum sequence length: 2049, sample length: 2386 [default0]:Skipping sample id=2737827. Maximum sequence length: 2049, sample length: 3903 [default0]:Skipping sample id=2715890. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2725687. Maximum sequence length: 2049, sample length: 2456 [default0]:Skipping sample id=2737525. Maximum sequence length: 2049, sample length: 3505 [default0]:Skipping sample id=2482706. Maximum sequence length: 2049, sample length: 3595 [default0]:Skipping sample id=2740976. Maximum sequence length: 2049, sample length: 3481 [default0]:Skipping sample id=2726435. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2738450. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2734196. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2492542. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2726507. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2755930. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2716028. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2739145. Maximum sequence length: 2049, sample length: 4124 [default0]:Skipping sample id=2746490. Maximum sequence length: 2049, sample length: 3267 [default0]:Skipping sample id=2731798. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2747226. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2738731. Maximum sequence length: 2049, sample length: 3531 [default0]:Skipping sample id=2750428. Maximum sequence length: 2049, sample length: 3770 [default0]:Skipping sample id=2737951. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2470622. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2747912. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2755095. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2741208. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2729648. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2747214. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2756372. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2735123. Maximum sequence length: 2049, sample length: 4975 [default0]:Skipping sample id=2742558. Maximum sequence length: 2049, sample length: 2668 [default0]:Skipping sample id=2492555. Maximum sequence length: 2049, sample length: 3664 [default0]:Skipping sample id=2495976. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2727590. Maximum sequence length: 2049, sample length: 5933 [default0]:Skipping sample id=2734833. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2717455. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2726164. Maximum sequence length: 2049, sample length: 3626 [default0]:Skipping sample id=2748017. Maximum sequence length: 2049, sample length: 5323 [default0]:Skipping sample id=2482038. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2750527. Maximum sequence length: 2049, sample length: 4205 [default0]:Skipping sample id=2739203. Maximum sequence length: 2049, sample length: 2857 [default0]:Skipping sample id=2736154. Maximum sequence length: 2049, sample length: 2635 [default0]:Skipping sample id=2713532. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2733434. Maximum sequence length: 2049, sample length: 4924 [default0]:Skipping sample id=2750812. Maximum sequence length: 2049, sample length: 3242 [default0]:Skipping sample id=2757092. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2467895. Maximum sequence length: 2049, sample length: 2918 [default0]:Skipping sample id=2745938. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2714892. Maximum sequence length: 2049, sample length: 3589 [default0]:Skipping sample id=2751331. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2488885. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2721498. Maximum sequence length: 2049, sample length: 2972 [default0]:Skipping sample id=2737327. Maximum sequence length: 2049, sample length: 3464 [default0]:Skipping sample id=2726995. Maximum sequence length: 2049, sample length: 3449 [default0]:Skipping sample id=2715169. Maximum sequence length: 2049, sample length: 3622 [default0]:Skipping sample id=2483181. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2750995. Maximum sequence length: 2049, sample length: 6148 [default0]:Skipping sample id=2736414. Maximum sequence length: 2049, sample length: 4528 [default0]:Skipping sample id=2731815. Maximum sequence length: 2049, sample length: 2958 [default0]:Skipping sample id=2742063. Maximum sequence length: 2049, sample length: 2714 [default0]:Skipping sample id=2493137. Maximum sequence length: 2049, sample length: 2716 [default0]:Skipping sample id=2754910. Maximum sequence length: 2049, sample length: 2225 [default0]:Skipping sample id=2738300. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2735446. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2717070. Maximum sequence length: 2049, sample length: 5320 [default0]:Skipping sample id=2753416. Maximum sequence length: 2049, sample length: 3639 [default0]:Skipping sample id=2745923. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2727675. Maximum sequence length: 2049, sample length: 3024 [default0]:Skipping sample id=2745650. Maximum sequence length: 2049, sample length: 4243 [default0]:Skipping sample id=2733857. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2486205. Maximum sequence length: 2049, sample length: 2667 [default0]:Skipping sample id=2754776. Maximum sequence length: 2049, sample length: 3154 [default0]:Skipping sample id=2469262. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2717852. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2738378. Maximum sequence length: 2049, sample length: 3041 [default0]:Skipping sample id=2736370. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2713991. Maximum sequence length: 2049, sample length: 3432 [default0]:Skipping sample id=2728306. Maximum sequence length: 2049, sample length: 3820 [default0]:Skipping sample id=2718012. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2739623. Maximum sequence length: 2049, sample length: 5143 [default0]:Skipping sample id=2712959. Maximum sequence length: 2049, sample length: 3745 [default0]:Skipping sample id=2722639. Maximum sequence length: 2049, sample length: 5339 [default0]:Skipping sample id=2477157. Maximum sequence length: 2049, sample length: 2606 [default0]:Skipping sample id=2718050. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2747158. Maximum sequence length: 2049, sample length: 4868 [default0]:Skipping sample id=2757035. Maximum sequence length: 2049, sample length: 3155 [default0]:Skipping sample id=2489876. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2715749. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2485089. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2720798. Maximum sequence length: 2049, sample length: 3269 [default0]:Skipping sample id=2755308. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2733129. Maximum sequence length: 2049, sample length: 4553 [default0]:Skipping sample id=2757112. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2724638. Maximum sequence length: 2049, sample length: 2627 [default0]:Skipping sample id=2715383. Maximum sequence length: 2049, sample length: 3595 [default0]:Skipping sample id=2727945. Maximum sequence length: 2049, sample length: 4324 [default0]:Skipping sample id=2739656. Maximum sequence length: 2049, sample length: 4248 [default0]:Skipping sample id=2714241. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2722996. Maximum sequence length: 2049, sample length: 3080 [default0]:Skipping sample id=2741748. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2746882. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2733381. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2729233. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2747136. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2715886. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2725427. Maximum sequence length: 2049, sample length: 3629 [default0]:Skipping sample id=2751480. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2740012. Maximum sequence length: 2049, sample length: 2560 [default0]:Skipping sample id=2738400. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2740217. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2749430. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2479048. Maximum sequence length: 2049, sample length: 2728 [default0]:Skipping sample id=2483085. Maximum sequence length: 2049, sample length: 3179 [default0]:Skipping sample id=2714259. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2754961. Maximum sequence length: 2049, sample length: 3000 [default0]:Skipping sample id=2724385. Maximum sequence length: 2049, sample length: 2090 [default0]:Skipping sample id=2722938. Maximum sequence length: 2049, sample length: 2196 [default0]:Skipping sample id=2491688. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2737454. Maximum sequence length: 2049, sample length: 3413 [default0]:Skipping sample id=2483858. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2714415. Maximum sequence length: 2049, sample length: 2213 [default0]:Skipping sample id=2735114. Maximum sequence length: 2049, sample length: 2857 [default0]:Skipping sample id=2751242. Maximum sequence length: 2049, sample length: 2968 [default0]:Skipping sample id=2746673. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2749433. Maximum sequence length: 2049, sample length: 2837 [default0]:Skipping sample id=2717769. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2719405. Maximum sequence length: 2049, sample length: 3268 [default0]:Skipping sample id=2730560. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2745364. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2745384. Maximum sequence length: 2049, sample length: 3537 [default0]:Skipping sample id=2751667. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2711986. Maximum sequence length: 2049, sample length: 5181 [default0]:Skipping sample id=2717171. Maximum sequence length: 2049, sample length: 3846 [default0]:Skipping sample id=2719568. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2728981. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2726005. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2722822. Maximum sequence length: 2049, sample length: 3181 [default0]:Skipping sample id=2495085. Maximum sequence length: 2049, sample length: 2861 [default0]:Skipping sample id=2720730. Maximum sequence length: 2049, sample length: 2505 [default0]:Skipping sample id=2739092. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2745917. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2714909. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2752558. Maximum sequence length: 2049, sample length: 4506 [default0]:Skipping sample id=2493878. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2724762. Maximum sequence length: 2049, sample length: 2122 [default0]:Skipping sample id=2741205. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2741254. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2497886. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2740728. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2748408. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2712166. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2751308. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2726811. Maximum sequence length: 2049, sample length: 2052 [default0]:Skipping sample id=2488956. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2748497. Maximum sequence length: 2049, sample length: 2789 [default0]:Skipping sample id=2743243. Maximum sequence length: 2049, sample length: 4620 [default0]:Skipping sample id=2480665. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2724672. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2722647. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2738986. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2713089. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2465913. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2752592. Maximum sequence length: 2049, sample length: 4842 [default0]:Skipping sample id=2718300. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2731794. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2740519. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2744446. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2750743. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2755794. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2723100. Maximum sequence length: 2049, sample length: 4917 [default0]:Skipping sample id=2739451. Maximum sequence length: 2049, sample length: 3050 [default0]:Skipping sample id=2737351. Maximum sequence length: 2049, sample length: 3159 [default0]:Skipping sample id=2480015. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2738816. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2714525. Maximum sequence length: 2049, sample length: 3882 [default0]:Skipping sample id=2755160. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2731986. Maximum sequence length: 2049, sample length: 4228 [default0]:Skipping sample id=2725606. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2467856. Maximum sequence length: 2049, sample length: 2085 [default0]:Skipping sample id=2744841. Maximum sequence length: 2049, sample length: 2646 [default0]:Skipping sample id=2718266. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2725902. Maximum sequence length: 2049, sample length: 3322 [default0]:Skipping sample id=2741711. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2739471. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2467982. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2734977. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2727039. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2711708. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2741599. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2729304. Maximum sequence length: 2049, sample length: 3989 [default0]:Skipping sample id=2749825. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2489643. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2740516. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2719593. Maximum sequence length: 2049, sample length: 3353 [default0]:Skipping sample id=2746641. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2752395. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2754584. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2735266. Maximum sequence length: 2049, sample length: 3731 [default0]:Skipping sample id=2466448. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2753960. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2730985. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2711154. Maximum sequence length: 2049, sample length: 2060 [default0]:Skipping sample id=2719660. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2719201. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2495816. Maximum sequence length: 2049, sample length: 2414 [default0]:Skipping sample id=2497044. Maximum sequence length: 2049, sample length: 3387 [default0]:Skipping sample id=2740109. Maximum sequence length: 2049, sample length: 3175 [default0]:Skipping sample id=2730107. Maximum sequence length: 2049, sample length: 2850 [default0]:Skipping sample id=2720907. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2718342. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2755917. Maximum sequence length: 2049, sample length: 3040 [default0]:Skipping sample id=2478460. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2711504. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2745478. Maximum sequence length: 2049, sample length: 3740 [default0]:Skipping sample id=2734005. Maximum sequence length: 2049, sample length: 3185 [default0]:Skipping sample id=2716869. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2718443. Maximum sequence length: 2049, sample length: 2633 [default0]:Skipping sample id=2470621. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2745248. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2750158. Maximum sequence length: 2049, sample length: 4097 [default0]:Skipping sample id=2719330. Maximum sequence length: 2049, sample length: 4703 [default0]:Skipping sample id=2723016. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2729572. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2719254. Maximum sequence length: 2049, sample length: 4600 [default0]:Skipping sample id=2743172. Maximum sequence length: 2049, sample length: 3189 [default0]:Skipping sample id=2756248. Maximum sequence length: 2049, sample length: 2210 [default0]:Skipping sample id=2743231. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2732611. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2730190. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2745200. Maximum sequence length: 2049, sample length: 3397 [default0]:Skipping sample id=2741970. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2746812. Maximum sequence length: 2049, sample length: 3494 [default0]:Skipping sample id=2750517. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2734172. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2740443. Maximum sequence length: 2049, sample length: 5058 [default0]:Skipping sample id=2751771. Maximum sequence length: 2049, sample length: 2981 [default0]:Skipping sample id=2743058. Maximum sequence length: 2049, sample length: 2636 [default0]:Skipping sample id=2751898. Maximum sequence length: 2049, sample length: 3113 [default0]:Skipping sample id=2716971. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2725826. Maximum sequence length: 2049, sample length: 4181 [default0]:Skipping sample id=2739685. Maximum sequence length: 2049, sample length: 3598 [default0]:Skipping sample id=2735326. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2730819. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2733830. Maximum sequence length: 2049, sample length: 2623 [default0]:Skipping sample id=2488630. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2712640. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2753357. Maximum sequence length: 2049, sample length: 3673 [default0]:Skipping sample id=2734574. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2739733. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2714165. Maximum sequence length: 2049, sample length: 5322 [default0]:Skipping sample id=2748909. Maximum sequence length: 2049, sample length: 4916 [default0]:Skipping sample id=2712028. Maximum sequence length: 2049, sample length: 3171 [default0]:Skipping sample id=2725036. Maximum sequence length: 2049, sample length: 3261 [default0]:Skipping sample id=2755122. Maximum sequence length: 2049, sample length: 3381 [default0]:Skipping sample id=2756682. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2723858. Maximum sequence length: 2049, sample length: 4768 [default0]:Skipping sample id=2731893. Maximum sequence length: 2049, sample length: 3481 [default0]:Skipping sample id=2740668. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2726906. Maximum sequence length: 2049, sample length: 3534 [default0]:Skipping sample id=2727311. Maximum sequence length: 2049, sample length: 3265 [default0]:Skipping sample id=2465769. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2721166. Maximum sequence length: 2049, sample length: 3302 [default0]:Skipping sample id=2715354. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2722615. Maximum sequence length: 2049, sample length: 3243 [default0]:Skipping sample id=2731809. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2736439. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2720594. Maximum sequence length: 2049, sample length: 3381 [default0]:Skipping sample id=2724944. Maximum sequence length: 2049, sample length: 2264 [default0]:Skipping sample id=2722196. Maximum sequence length: 2049, sample length: 2907 [default0]:Skipping sample id=2723085. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2741282. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2719017. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2722656. Maximum sequence length: 2049, sample length: 3270 [default0]:Skipping sample id=2740309. Maximum sequence length: 2049, sample length: 2334 [default0]:Skipping sample id=2724722. Maximum sequence length: 2049, sample length: 3360 [default0]:Skipping sample id=2751752. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2718150. Maximum sequence length: 2049, sample length: 6409 [default0]:Skipping sample id=2737518. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2712080. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2754897. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2469820. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2752014. Maximum sequence length: 2049, sample length: 3322 [default0]:Skipping sample id=2729925. Maximum sequence length: 2049, sample length: 4765 [default0]:Skipping sample id=2743884. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2724652. Maximum sequence length: 2049, sample length: 2099 [default0]:Skipping sample id=2734521. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2728025. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2741272. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2749473. Maximum sequence length: 2049, sample length: 3740 [default0]:Skipping sample id=2731077. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2736411. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2747976. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2493242. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2748814. Maximum sequence length: 2049, sample length: 5624 [default0]:Skipping sample id=2756028. Maximum sequence length: 2049, sample length: 3085 [default0]:Skipping sample id=2730376. Maximum sequence length: 2049, sample length: 3737 [default0]:Skipping sample id=2754622. Maximum sequence length: 2049, sample length: 3761 [default0]:Skipping sample id=2720066. Maximum sequence length: 2049, sample length: 3724 [default0]:Skipping sample id=2718253. Maximum sequence length: 2049, sample length: 4371 [default0]:Skipping sample id=2754551. Maximum sequence length: 2049, sample length: 3929 [default0]:Skipping sample id=2739299. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2756566. Maximum sequence length: 2049, sample length: 3770 [default0]:Skipping sample id=2717838. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2719413. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2482474. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2716093. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2494659. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2748825. Maximum sequence length: 2049, sample length: 4035 [default0]:Skipping sample id=2722962. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2727461. Maximum sequence length: 2049, sample length: 3740 [default0]:Skipping sample id=2717127. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2729987. Maximum sequence length: 2049, sample length: 2233 [default0]:Skipping sample id=2755558. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2736594. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2721385. Maximum sequence length: 2049, sample length: 3358 [default0]:Skipping sample id=2738118. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2751046. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2732403. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2717504. Maximum sequence length: 2049, sample length: 3996 [default0]:Skipping sample id=2718724. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2718479. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2744752. Maximum sequence length: 2049, sample length: 5145 [default0]:Skipping sample id=2718699. Maximum sequence length: 2049, sample length: 3978 [default0]:Skipping sample id=2716907. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2726646. Maximum sequence length: 2049, sample length: 4401 [default0]:Skipping sample id=2749467. Maximum sequence length: 2049, sample length: 3978 [default0]:Skipping sample id=2721372. Maximum sequence length: 2049, sample length: 3462 [default0]:Skipping sample id=2722924. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2747730. Maximum sequence length: 2049, sample length: 3401 [default0]:Skipping sample id=2734397. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2712978. Maximum sequence length: 2049, sample length: 3915 [default0]:Skipping sample id=2731366. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2726124. Maximum sequence length: 2049, sample length: 3219 [default0]:Skipping sample id=2727192. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2747360. Maximum sequence length: 2049, sample length: 2333 [default0]:Skipping sample id=2722455. Maximum sequence length: 2049, sample length: 4162 [default0]:Skipping sample id=2484802. Maximum sequence length: 2049, sample length: 3017 [default0]:Skipping sample id=2740543. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2496741. Maximum sequence length: 2049, sample length: 3530 [default0]:Skipping sample id=2753363. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2494139. Maximum sequence length: 2049, sample length: 3073 [default0]:Skipping sample id=2751343. Maximum sequence length: 2049, sample length: 3685 [default0]:Skipping sample id=2725953. Maximum sequence length: 2049, sample length: 3967 [default0]:Skipping sample id=2712360. Maximum sequence length: 2049, sample length: 4977 [default0]:Skipping sample id=2721866. Maximum sequence length: 2049, sample length: 3466 [default0]:Skipping sample id=2748697. Maximum sequence length: 2049, sample length: 4283 [default0]:Skipping sample id=2735785. Maximum sequence length: 2049, sample length: 3834 [default0]:Skipping sample id=2724734. Maximum sequence length: 2049, sample length: 3863 [default0]:Skipping sample id=2741473. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2730413. Maximum sequence length: 2049, sample length: 3578 [default0]:Skipping sample id=2715669. Maximum sequence length: 2049, sample length: 3269 [default0]:Skipping sample id=2712607. Maximum sequence length: 2049, sample length: 2706 [default0]:Skipping sample id=2715830. Maximum sequence length: 2049, sample length: 2305 [default0]:Skipping sample id=2739283. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2714934. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2714192. Maximum sequence length: 2049, sample length: 2562 [default0]:Skipping sample id=2732624. Maximum sequence length: 2049, sample length: 2393 [default0]:Skipping sample id=2721221. Maximum sequence length: 2049, sample length: 2177 [default0]:Skipping sample id=2756277. Maximum sequence length: 2049, sample length: 3618 [default0]:Skipping sample id=2719763. Maximum sequence length: 2049, sample length: 3101 [default0]:Skipping sample id=2728485. Maximum sequence length: 2049, sample length: 3044 [default0]:Skipping sample id=2727021. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2747367. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2755284. Maximum sequence length: 2049, sample length: 4430 [default0]:Skipping sample id=2724136. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2755774. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2732992. Maximum sequence length: 2049, sample length: 5076 [default0]:Skipping sample id=2755712. Maximum sequence length: 2049, sample length: 3806 [default0]:Skipping sample id=2725892. Maximum sequence length: 2049, sample length: 4240 [default0]:Skipping sample id=2728545. Maximum sequence length: 2049, sample length: 2975 [default0]:Skipping sample id=2748476. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2716727. Maximum sequence length: 2049, sample length: 3479 [default0]:Skipping sample id=2488581. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2715764. Maximum sequence length: 2049, sample length: 2371 [default0]:Skipping sample id=2744835. Maximum sequence length: 2049, sample length: 3776 [default0]:Skipping sample id=2746453. Maximum sequence length: 2049, sample length: 2981 [default0]:Skipping sample id=2731440. Maximum sequence length: 2049, sample length: 2266 [default0]:Skipping sample id=2733867. Maximum sequence length: 2049, sample length: 3695 [default0]:Skipping sample id=2732232. Maximum sequence length: 2049, sample length: 2232 [default0]:Skipping sample id=2753607. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2751754. Maximum sequence length: 2049, sample length: 2120 [default0]:Skipping sample id=2469088. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2732173. Maximum sequence length: 2049, sample length: 2876 [default0]:Skipping sample id=2748011. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2714961. Maximum sequence length: 2049, sample length: 4832 [default0]:Skipping sample id=2489756. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2735042. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2740689. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2719004. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2480714. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2477756. Maximum sequence length: 2049, sample length: 3159 [default0]:Skipping sample id=2722137. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2487866. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2479972. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2720221. Maximum sequence length: 2049, sample length: 6265 [default0]:Skipping sample id=2479939. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2736639. Maximum sequence length: 2049, sample length: 3180 [default0]:Skipping sample id=2496768. Maximum sequence length: 2049, sample length: 2761 [default0]:Skipping sample id=2749331. Maximum sequence length: 2049, sample length: 3457 [default0]:Skipping sample id=2727185. Maximum sequence length: 2049, sample length: 2826 [default0]:Skipping sample id=2493470. Maximum sequence length: 2049, sample length: 2215 [default0]:Skipping sample id=2744496. Maximum sequence length: 2049, sample length: 4440 [default0]:Skipping sample id=2756459. Maximum sequence length: 2049, sample length: 4242 [default0]:Skipping sample id=2715192. Maximum sequence length: 2049, sample length: 2055 [default0]:Skipping sample id=2755450. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2720708. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2479062. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2733078. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2715695. Maximum sequence length: 2049, sample length: 3960 [default0]:Skipping sample id=2733416. Maximum sequence length: 2049, sample length: 2456 [default0]:Skipping sample id=2721577. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2742682. Maximum sequence length: 2049, sample length: 4368 [default0]:Skipping sample id=2729301. Maximum sequence length: 2049, sample length: 2568 [default0]:Skipping sample id=2719943. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2724631. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2735041. Maximum sequence length: 2049, sample length: 3872 [default0]:Skipping sample id=2747742. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2754292. Maximum sequence length: 2049, sample length: 3364 [default0]:Skipping sample id=2480106. Maximum sequence length: 2049, sample length: 3236 [default0]:Skipping sample id=2467958. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2729242. Maximum sequence length: 2049, sample length: 2952 [default0]:Skipping sample id=2756073. Maximum sequence length: 2049, sample length: 5015 [default0]:Skipping sample id=2733671. Maximum sequence length: 2049, sample length: 7200 [default0]:Skipping sample id=2717924. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2727103. Maximum sequence length: 2049, sample length: 2232 [default0]:Skipping sample id=2729300. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2744247. Maximum sequence length: 2049, sample length: 3265 [default0]:Skipping sample id=2723927. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2717216. Maximum sequence length: 2049, sample length: 4332 [default0]:Skipping sample id=2726235. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2735057. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2723946. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2738335. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2717047. Maximum sequence length: 2049, sample length: 3881 [default0]:Skipping sample id=2721634. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2748260. Maximum sequence length: 2049, sample length: 3126 [default0]:Skipping sample id=2745645. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2717767. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2731334. Maximum sequence length: 2049, sample length: 4303 [default0]:Skipping sample id=2731972. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2715977. Maximum sequence length: 2049, sample length: 2145 [default0]:Skipping sample id=2713068. Maximum sequence length: 2049, sample length: 3990 [default0]:Skipping sample id=2712169. Maximum sequence length: 2049, sample length: 3468 [default0]:Skipping sample id=2752359. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2733360. Maximum sequence length: 2049, sample length: 2998 [default0]:Skipping sample id=2727765. Maximum sequence length: 2049, sample length: 4475 [default0]:Skipping sample id=2756639. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2731544. Maximum sequence length: 2049, sample length: 3013 [default0]:Skipping sample id=2482070. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2747033. Maximum sequence length: 2049, sample length: 2862 [default0]:Skipping sample id=2754287. Maximum sequence length: 2049, sample length: 3066 [default0]:Skipping sample id=2487611. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2726042. Maximum sequence length: 2049, sample length: 4503 [default0]:Skipping sample id=2742216. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2715409. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2751991. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2742497. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2715589. Maximum sequence length: 2049, sample length: 4736 [default0]:Skipping sample id=2711911. Maximum sequence length: 2049, sample length: 4714 [default0]:Skipping sample id=2730922. Maximum sequence length: 2049, sample length: 2189 [default0]:Skipping sample id=2738861. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2492429. Maximum sequence length: 2049, sample length: 2182 [default0]:Skipping sample id=2711365. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2492357. Maximum sequence length: 2049, sample length: 2201 [default0]:Skipping sample id=2750240. Maximum sequence length: 2049, sample length: 5413 [default0]:Skipping sample id=2468310. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2742788. Maximum sequence length: 2049, sample length: 3328 [default0]:Skipping sample id=2737169. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2746767. Maximum sequence length: 2049, sample length: 4121 [default0]:Skipping sample id=2714937. Maximum sequence length: 2049, sample length: 3724 [default0]:Skipping sample id=2750329. Maximum sequence length: 2049, sample length: 3404 [default0]:Skipping sample id=2752771. Maximum sequence length: 2049, sample length: 3344 [default0]:Skipping sample id=2482068. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2751839. Maximum sequence length: 2049, sample length: 4321 [default0]:Skipping sample id=2714898. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2724836. Maximum sequence length: 2049, sample length: 3785 [default0]:Skipping sample id=2738789. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2477386. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2749921. Maximum sequence length: 2049, sample length: 3070 [default0]:Skipping sample id=2747678. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2468797. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2736609. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2726632. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2717693. Maximum sequence length: 2049, sample length: 3174 [default0]:Skipping sample id=2744563. Maximum sequence length: 2049, sample length: 2209 [default0]:Skipping sample id=2730648. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2744047. Maximum sequence length: 2049, sample length: 3905 [default0]:Skipping sample id=2727211. Maximum sequence length: 2049, sample length: 2715 [default0]:Skipping sample id=2731362. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2490968. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2720402. Maximum sequence length: 2049, sample length: 3305 [default0]:Skipping sample id=2742015. Maximum sequence length: 2049, sample length: 3062 [default0]:Skipping sample id=2746351. Maximum sequence length: 2049, sample length: 2456 [default0]:Skipping sample id=2733064. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2744687. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2489855. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2493987. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2741424. Maximum sequence length: 2049, sample length: 3109 [default0]:Skipping sample id=2738523. Maximum sequence length: 2049, sample length: 3023 [default0]:Skipping sample id=2466012. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2730244. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2493081. Maximum sequence length: 2049, sample length: 2541 [default0]:Skipping sample id=2753495. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2713868. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2754491. Maximum sequence length: 2049, sample length: 2741 [default0]:Skipping sample id=2748935. Maximum sequence length: 2049, sample length: 3438 [default0]:Skipping sample id=2724556. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2747189. Maximum sequence length: 2049, sample length: 4362 [default0]:Skipping sample id=2742633. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2728776. Maximum sequence length: 2049, sample length: 3498 [default0]:Skipping sample id=2728842. Maximum sequence length: 2049, sample length: 5707 [default0]:Skipping sample id=2755903. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2736503. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2735762. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2495124. Maximum sequence length: 2049, sample length: 3349 [default0]:Skipping sample id=2750713. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2755668. Maximum sequence length: 2049, sample length: 3679 [default0]:Skipping sample id=2712306. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2727460. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2498775. Maximum sequence length: 2049, sample length: 2810 [default0]:Skipping sample id=2494751. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2733828. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2747187. Maximum sequence length: 2049, sample length: 4917 [default0]:Skipping sample id=2726250. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2723146. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2755328. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2727370. Maximum sequence length: 2049, sample length: 2776 [default0]:Skipping sample id=2496923. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2722563. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2739310. Maximum sequence length: 2049, sample length: 3545 [default0]:Skipping sample id=2738991. Maximum sequence length: 2049, sample length: 4219 [default0]:Skipping sample id=2734710. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2723604. Maximum sequence length: 2049, sample length: 4369 [default0]:Skipping sample id=2717014. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2491719. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2748472. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2719038. Maximum sequence length: 2049, sample length: 3612 [default0]:Skipping sample id=2721517. Maximum sequence length: 2049, sample length: 4418 [default0]:Skipping sample id=2733316. Maximum sequence length: 2049, sample length: 3177 [default0]:Skipping sample id=2732830. Maximum sequence length: 2049, sample length: 2801 [default0]:Skipping sample id=2719347. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2737081. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2719217. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2728538. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2726606. Maximum sequence length: 2049, sample length: 2721 [default0]:Skipping sample id=2490354. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2732736. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2486464. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2728919. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2721892. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2467250. Maximum sequence length: 2049, sample length: 3274 [default0]:Skipping sample id=2746383. Maximum sequence length: 2049, sample length: 3621 [default0]:Skipping sample id=2748500. Maximum sequence length: 2049, sample length: 3162 [default0]:Skipping sample id=2743991. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2722248. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2722150. Maximum sequence length: 2049, sample length: 3106 [default0]:Skipping sample id=2725568. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2732042. Maximum sequence length: 2049, sample length: 4673 [default0]:Skipping sample id=2749991. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2736704. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2733674. Maximum sequence length: 2049, sample length: 2975 [default0]:Skipping sample id=2722879. Maximum sequence length: 2049, sample length: 2080 [default0]:Skipping sample id=2717772. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2724171. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2749392. Maximum sequence length: 2049, sample length: 3071 [default0]:Skipping sample id=2752791. Maximum sequence length: 2049, sample length: 2626 [default0]:Skipping sample id=2492434. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2756966. Maximum sequence length: 2049, sample length: 5239 [default0]:Skipping sample id=2737433. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2498146. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2721750. Maximum sequence length: 2049, sample length: 4053 [default0]:Skipping sample id=2745260. Maximum sequence length: 2049, sample length: 2598 [default0]:Skipping sample id=2497441. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2750166. Maximum sequence length: 2049, sample length: 6563 [default0]:Skipping sample id=2747256. Maximum sequence length: 2049, sample length: 3800 [default0]:Skipping sample id=2495672. Maximum sequence length: 2049, sample length: 2510 [default0]:Skipping sample id=2750466. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2717795. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2736975. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2753316. Maximum sequence length: 2049, sample length: 2640 [default0]:Skipping sample id=2716470. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2716904. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2727227. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2739491. Maximum sequence length: 2049, sample length: 2937 [default0]:Skipping sample id=2478722. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2716843. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2752373. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2737138. Maximum sequence length: 2049, sample length: 4125 [default0]:Skipping sample id=2740290. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2750929. Maximum sequence length: 2049, sample length: 3438 [default0]:Skipping sample id=2466717. Maximum sequence length: 2049, sample length: 2303 [default0]:Skipping sample id=2723937. Maximum sequence length: 2049, sample length: 3153 [default0]:Skipping sample id=2731879. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2733631. Maximum sequence length: 2049, sample length: 3750 [default0]:Skipping sample id=2719410. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2726366. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2752050. Maximum sequence length: 2049, sample length: 4844 [default0]:Skipping sample id=2721007. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2719498. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2494557. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2726181. Maximum sequence length: 2049, sample length: 3086 [default0]:Skipping sample id=2743152. Maximum sequence length: 2049, sample length: 4414 [default0]:Skipping sample id=2715742. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2478267. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2725870. Maximum sequence length: 2049, sample length: 3003 [default0]:Skipping sample id=2751345. Maximum sequence length: 2049, sample length: 3428 [default0]:Skipping sample id=2723453. Maximum sequence length: 2049, sample length: 4635 [default0]:Skipping sample id=2751511. Maximum sequence length: 2049, sample length: 3365 [default0]:Skipping sample id=2495133. Maximum sequence length: 2049, sample length: 2842 [default0]:Skipping sample id=2724439. Maximum sequence length: 2049, sample length: 5981 [default0]:Skipping sample id=2737434. Maximum sequence length: 2049, sample length: 3494 [default0]:Skipping sample id=2711888. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2733305. Maximum sequence length: 2049, sample length: 4075 [default0]:Skipping sample id=2719569. Maximum sequence length: 2049, sample length: 6863 [default0]:Skipping sample id=2730232. Maximum sequence length: 2049, sample length: 3812 [default0]:Skipping sample id=2719388. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2718045. Maximum sequence length: 2049, sample length: 3156 [default0]:Skipping sample id=2738354. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2746778. Maximum sequence length: 2049, sample length: 4070 [default0]:Skipping sample id=2747110. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2732040. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2751280. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2743968. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2469354. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2732871. Maximum sequence length: 2049, sample length: 2116 [default0]:Skipping sample id=2712030. Maximum sequence length: 2049, sample length: 14257 [default0]:Skipping sample id=2481989. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2722176. Maximum sequence length: 2049, sample length: 3888 [default0]:Skipping sample id=2740097. Maximum sequence length: 2049, sample length: 6417 [default0]:Skipping sample id=2746669. Maximum sequence length: 2049, sample length: 3102 [default0]:Skipping sample id=2481113. Maximum sequence length: 2049, sample length: 2202 [default0]:Skipping sample id=2752891. Maximum sequence length: 2049, sample length: 2821 [default0]:Skipping sample id=2748571. Maximum sequence length: 2049, sample length: 3638 [default0]:Skipping sample id=2752859. Maximum sequence length: 2049, sample length: 3400 [default0]:Skipping sample id=2731415. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2752151. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2728171. Maximum sequence length: 2049, sample length: 2214 [default0]:Skipping sample id=2714686. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2470740. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2756089. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2740722. Maximum sequence length: 2049, sample length: 7556 [default0]:Skipping sample id=2754534. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2715279. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2722030. Maximum sequence length: 2049, sample length: 3851 [default0]:Skipping sample id=2755151. Maximum sequence length: 2049, sample length: 8127 [default0]:Skipping sample id=2741838. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2487996. Maximum sequence length: 2049, sample length: 2783 [default0]:Skipping sample id=2494163. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2467689. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2753670. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2724365. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2714832. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2721103. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2744955. Maximum sequence length: 2049, sample length: 5859 [default0]:Skipping sample id=2712579. Maximum sequence length: 2049, sample length: 2483 [default0]:Skipping sample id=2727690. Maximum sequence length: 2049, sample length: 6946 [default0]:Skipping sample id=2738220. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2713489. Maximum sequence length: 2049, sample length: 3196 [default0]:Skipping sample id=2731465. Maximum sequence length: 2049, sample length: 2671 [default0]:Skipping sample id=2746717. Maximum sequence length: 2049, sample length: 6239 [default0]:Skipping sample id=2720387. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2723372. Maximum sequence length: 2049, sample length: 2813 [default0]:Skipping sample id=2748963. Maximum sequence length: 2049, sample length: 3738 [default0]:Skipping sample id=2752963. Maximum sequence length: 2049, sample length: 3650 [default0]:Skipping sample id=2742931. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2479085. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2733732. Maximum sequence length: 2049, sample length: 3458 [default0]:Skipping sample id=2713151. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2753257. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2726689. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2721194. Maximum sequence length: 2049, sample length: 5988 [default0]:Skipping sample id=2756626. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2718606. Maximum sequence length: 2049, sample length: 3847 [default0]:Skipping sample id=2746020. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2718912. Maximum sequence length: 2049, sample length: 2293 [default0]:Skipping sample id=2721191. Maximum sequence length: 2049, sample length: 3191 [default0]:Skipping sample id=2715539. Maximum sequence length: 2049, sample length: 3759 [default0]:Skipping sample id=2752007. Maximum sequence length: 2049, sample length: 2076 [default0]:Skipping sample id=2736365. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2711260. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2722084. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2750660. Maximum sequence length: 2049, sample length: 3726 [default0]:Skipping sample id=2753039. Maximum sequence length: 2049, sample length: 2612 [default0]:Skipping sample id=2710963. Maximum sequence length: 2049, sample length: 2170 [default0]:Skipping sample id=2729045. Maximum sequence length: 2049, sample length: 3487 [default0]:Skipping sample id=2711838. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2467366. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2748108. Maximum sequence length: 2049, sample length: 4601 [default0]:Skipping sample id=2748531. Maximum sequence length: 2049, sample length: 5160 [default0]:Skipping sample id=2487206. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2737237. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2491793. Maximum sequence length: 2049, sample length: 2402 [default0]:Skipping sample id=2724160. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2713978. Maximum sequence length: 2049, sample length: 5423 [default0]:Skipping sample id=2734834. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2727903. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2731881. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2722731. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2495028. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2725393. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2466398. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2480672. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2750831. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2480575. Maximum sequence length: 2049, sample length: 2167 [default0]:Skipping sample id=2723735. Maximum sequence length: 2049, sample length: 3854 [default0]:Skipping sample id=2735861. Maximum sequence length: 2049, sample length: 2961 [default0]:Skipping sample id=2732833. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2737919. Maximum sequence length: 2049, sample length: 2415 [default0]:Skipping sample id=2755696. Maximum sequence length: 2049, sample length: 2341 [default0]:Skipping sample id=2730796. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2754697. Maximum sequence length: 2049, sample length: 3550 [default0]:Skipping sample id=2729279. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2741720. Maximum sequence length: 2049, sample length: 3062 [default0]:Skipping sample id=2723647. Maximum sequence length: 2049, sample length: 3515 [default0]:Skipping sample id=2499301. Maximum sequence length: 2049, sample length: 2908 [default0]:Skipping sample id=2493108. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2723305. Maximum sequence length: 2049, sample length: 4189 [default0]:Skipping sample id=2719282. Maximum sequence length: 2049, sample length: 2946 [default0]:Skipping sample id=2711131. Maximum sequence length: 2049, sample length: 4878 [default0]:Skipping sample id=2720810. Maximum sequence length: 2049, sample length: 2494 [default0]:Skipping sample id=2744832. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2748084. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2739452. Maximum sequence length: 2049, sample length: 4060 [default0]:Skipping sample id=2727114. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2739585. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2720972. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2742004. Maximum sequence length: 2049, sample length: 3006 [default0]:Skipping sample id=2725416. Maximum sequence length: 2049, sample length: 5991 [default0]:Skipping sample id=2751185. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2723190. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2712997. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2746215. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2720765. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2718153. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2712877. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2731711. Maximum sequence length: 2049, sample length: 4628 [default0]:Skipping sample id=2738773. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2487921. Maximum sequence length: 2049, sample length: 3096 [default0]:Skipping sample id=2727511. Maximum sequence length: 2049, sample length: 2728 [default0]:Skipping sample id=2470381. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2495403. Maximum sequence length: 2049, sample length: 2825 [default0]:Skipping sample id=2711271. Maximum sequence length: 2049, sample length: 2267 [default0]:Skipping sample id=2753435. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2734480. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2722648. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2482299. Maximum sequence length: 2049, sample length: 3091 [default0]:Skipping sample id=2478142. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2727849. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2719623. Maximum sequence length: 2049, sample length: 4153 [default0]:Skipping sample id=2719328. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2467479. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2712594. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2722181. Maximum sequence length: 2049, sample length: 4106 [default0]:Skipping sample id=2747236. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2469413. Maximum sequence length: 2049, sample length: 3145 [default0]:Skipping sample id=2715303. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2725473. Maximum sequence length: 2049, sample length: 3441 [default0]:Skipping sample id=2756210. Maximum sequence length: 2049, sample length: 5756 [default0]:Skipping sample id=2720499. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2734987. Maximum sequence length: 2049, sample length: 3602 [default0]:Skipping sample id=2714991. Maximum sequence length: 2049, sample length: 3235 [default0]:Skipping sample id=2715638. Maximum sequence length: 2049, sample length: 2645 [default0]:Skipping sample id=2484554. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2722811. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2739228. Maximum sequence length: 2049, sample length: 3003 [default0]:Skipping sample id=2730159. Maximum sequence length: 2049, sample length: 2547 [default0]:Skipping sample id=2726335. Maximum sequence length: 2049, sample length: 2755 [default0]:Skipping sample id=2754930. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2738294. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2741029. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2719796. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2740079. Maximum sequence length: 2049, sample length: 2577 [default0]:Skipping sample id=2718874. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2724994. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2489474. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2716677. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2467012. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2756931. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2734184. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2742613. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2745238. Maximum sequence length: 2049, sample length: 3177 [default0]:Skipping sample id=2731396. Maximum sequence length: 2049, sample length: 4883 [default0]:Skipping sample id=2738542. Maximum sequence length: 2049, sample length: 4183 [default0]:Skipping sample id=2719300. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2488560. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2717084. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2715969. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2482697. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2738198. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2730272. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2742210. Maximum sequence length: 2049, sample length: 2354 [default0]:Skipping sample id=2729185. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2488770. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2719919. Maximum sequence length: 2049, sample length: 2115 [default0]:Skipping sample id=2729180. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2750351. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2712560. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2747904. Maximum sequence length: 2049, sample length: 2631 [default0]:Skipping sample id=2736470. Maximum sequence length: 2049, sample length: 2907 [default0]:Skipping sample id=2722062. Maximum sequence length: 2049, sample length: 4173 [default0]:Skipping sample id=2723065. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2724939. Maximum sequence length: 2049, sample length: 3703 [default0]:Skipping sample id=2716564. Maximum sequence length: 2049, sample length: 3824 [default0]:Skipping sample id=2746652. Maximum sequence length: 2049, sample length: 2194 [default0]:Skipping sample id=2751934. Maximum sequence length: 2049, sample length: 4141 [default0]:Skipping sample id=2711920. Maximum sequence length: 2049, sample length: 2621 [default0]:Skipping sample id=2756335. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2469758. Maximum sequence length: 2049, sample length: 3110 [default0]:Skipping sample id=2745666. Maximum sequence length: 2049, sample length: 3505 [default0]:Skipping sample id=2749795. Maximum sequence length: 2049, sample length: 2238 [default0]:Skipping sample id=2716889. Maximum sequence length: 2049, sample length: 2489 [default0]:Skipping sample id=2741889. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2734428. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2738487. Maximum sequence length: 2049, sample length: 4695 [default0]:Skipping sample id=2732525. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2721757. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2469954. Maximum sequence length: 2049, sample length: 2685 [default0]:Skipping sample id=2745834. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2736576. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2751444. Maximum sequence length: 2049, sample length: 3098 [default0]:Skipping sample id=2749240. Maximum sequence length: 2049, sample length: 4751 [default0]:Skipping sample id=2730261. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2754615. Maximum sequence length: 2049, sample length: 3009 [default0]:Skipping sample id=2714499. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2752867. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2484911. Maximum sequence length: 2049, sample length: 2765 [default0]:Skipping sample id=2750613. Maximum sequence length: 2049, sample length: 5319 [default0]:Skipping sample id=2734866. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2713986. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2747821. Maximum sequence length: 2049, sample length: 3619 [default0]:Skipping sample id=2746464. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2743569. Maximum sequence length: 2049, sample length: 2604 [default0]:Skipping sample id=2466452. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2719637. Maximum sequence length: 2049, sample length: 4426 [default0]:Skipping sample id=2746689. Maximum sequence length: 2049, sample length: 3827 [default0]:Skipping sample id=2732281. Maximum sequence length: 2049, sample length: 2758 [default0]:Skipping sample id=2746946. Maximum sequence length: 2049, sample length: 4131 [default0]:Skipping sample id=2747383. Maximum sequence length: 2049, sample length: 4085 [default0]:Skipping sample id=2735481. Maximum sequence length: 2049, sample length: 2927 [default0]:Skipping sample id=2715612. Maximum sequence length: 2049, sample length: 2996 [default0]:Skipping sample id=2725540. Maximum sequence length: 2049, sample length: 2959 [default0]:Skipping sample id=2734991. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2739274. Maximum sequence length: 2049, sample length: 4375 [default0]:Skipping sample id=2737276. Maximum sequence length: 2049, sample length: 4129 [default0]:Skipping sample id=2722783. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2745770. Maximum sequence length: 2049, sample length: 3081 [default0]:Skipping sample id=2745091. Maximum sequence length: 2049, sample length: 2262 [default0]:Skipping sample id=2743023. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2732951. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2732395. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2724789. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2756930. Maximum sequence length: 2049, sample length: 4421 [default0]:Skipping sample id=2717360. Maximum sequence length: 2049, sample length: 5517 [default0]:Skipping sample id=2752880. Maximum sequence length: 2049, sample length: 4227 [default0]:Skipping sample id=2716050. Maximum sequence length: 2049, sample length: 2655 [default0]:Skipping sample id=2752892. Maximum sequence length: 2049, sample length: 4936 [default0]:Skipping sample id=2734910. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2741761. Maximum sequence length: 2049, sample length: 4114 [default0]:Skipping sample id=2482480. Maximum sequence length: 2049, sample length: 2586 [default0]:Skipping sample id=2746179. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2713763. Maximum sequence length: 2049, sample length: 2398 [default0]:Skipping sample id=2741178. Maximum sequence length: 2049, sample length: 4867 [default0]:Skipping sample id=2745058. Maximum sequence length: 2049, sample length: 2172 [default0]:Skipping sample id=2743129. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2731104. Maximum sequence length: 2049, sample length: 3597 [default0]:Skipping sample id=2733673. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2730004. Maximum sequence length: 2049, sample length: 6604 [default0]:Skipping sample id=2727225. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2741989. Maximum sequence length: 2049, sample length: 7148 [default0]:Skipping sample id=2751350. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2732755. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2712100. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2742030. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2492274. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2752118. Maximum sequence length: 2049, sample length: 3358 [default0]:Skipping sample id=2715177. Maximum sequence length: 2049, sample length: 2451 [default0]:Skipping sample id=2489497. Maximum sequence length: 2049, sample length: 2439 [default0]:Skipping sample id=2739100. Maximum sequence length: 2049, sample length: 5527 [default0]:Skipping sample id=2749380. Maximum sequence length: 2049, sample length: 3812 [default0]:Skipping sample id=2496805. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2714149. Maximum sequence length: 2049, sample length: 3945 [default0]:Skipping sample id=2711130. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2725143. Maximum sequence length: 2049, sample length: 3281 [default0]:Skipping sample id=2754160. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2727171. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2735525. Maximum sequence length: 2049, sample length: 3186 [default0]:Skipping sample id=2732950. Maximum sequence length: 2049, sample length: 3280 [default0]:Skipping sample id=2711582. Maximum sequence length: 2049, sample length: 2742 [default0]:Skipping sample id=2748841. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2750840. Maximum sequence length: 2049, sample length: 3471 [default0]:Skipping sample id=2720445. Maximum sequence length: 2049, sample length: 4182 [default0]:Skipping sample id=2495903. Maximum sequence length: 2049, sample length: 2204 [default0]:Skipping sample id=2722896. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2494338. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2713631. Maximum sequence length: 2049, sample length: 2063 [default0]:Skipping sample id=2714462. Maximum sequence length: 2049, sample length: 5165 [default0]:Skipping sample id=2733639. Maximum sequence length: 2049, sample length: 3445 [default0]:Skipping sample id=2752152. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2736369. Maximum sequence length: 2049, sample length: 3996 [default0]:Skipping sample id=2719324. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2728530. Maximum sequence length: 2049, sample length: 2885 [default0]:Skipping sample id=2747153. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2718189. Maximum sequence length: 2049, sample length: 3832 [default0]:Skipping sample id=2756207. Maximum sequence length: 2049, sample length: 2871 [default0]:Skipping sample id=2731278. Maximum sequence length: 2049, sample length: 3678 [default0]:Skipping sample id=2738632. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2747578. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2487399. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2485662. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2488248. Maximum sequence length: 2049, sample length: 2149 [default0]:Skipping sample id=2730197. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2498540. Maximum sequence length: 2049, sample length: 2755 [default0]:Skipping sample id=2727661. Maximum sequence length: 2049, sample length: 2525 [default0]:Skipping sample id=2735015. Maximum sequence length: 2049, sample length: 2071 [default0]:Skipping sample id=2484294. Maximum sequence length: 2049, sample length: 2372 [default0]:Skipping sample id=2750952. Maximum sequence length: 2049, sample length: 3662 [default0]:Skipping sample id=2716142. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2716491. Maximum sequence length: 2049, sample length: 2245 [default0]:Skipping sample id=2715092. Maximum sequence length: 2049, sample length: 6265 [default0]:Skipping sample id=2728979. Maximum sequence length: 2049, sample length: 3248 [default0]:Skipping sample id=2735440. Maximum sequence length: 2049, sample length: 4587 [default0]:Skipping sample id=2477860. Maximum sequence length: 2049, sample length: 2370 [default0]:Skipping sample id=2721804. Maximum sequence length: 2049, sample length: 3604 [default0]:Skipping sample id=2730277. Maximum sequence length: 2049, sample length: 3083 [default0]:Skipping sample id=2722245. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2731479. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2714452. Maximum sequence length: 2049, sample length: 6638 [default0]:Skipping sample id=2751229. Maximum sequence length: 2049, sample length: 2784 [default0]:Skipping sample id=2734244. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2716070. Maximum sequence length: 2049, sample length: 3266 [default0]:Skipping sample id=2740641. Maximum sequence length: 2049, sample length: 3079 [default0]:Skipping sample id=2738053. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2739215. Maximum sequence length: 2049, sample length: 3733 [default0]:Skipping sample id=2713667. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2725435. Maximum sequence length: 2049, sample length: 2458 [default0]:Skipping sample id=2495315. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2713086. Maximum sequence length: 2049, sample length: 3681 [default0]:Skipping sample id=2712053. Maximum sequence length: 2049, sample length: 3224 [default0]:Skipping sample id=2724692. Maximum sequence length: 2049, sample length: 2717 [default0]:Skipping sample id=2490088. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2724156. Maximum sequence length: 2049, sample length: 2558 [default0]:Skipping sample id=2743578. Maximum sequence length: 2049, sample length: 3135 [default0]:Skipping sample id=2745212. Maximum sequence length: 2049, sample length: 3618 [default0]:Skipping sample id=2719555. Maximum sequence length: 2049, sample length: 3531 [default0]:Skipping sample id=2743685. Maximum sequence length: 2049, sample length: 5616 [default0]:Skipping sample id=2728922. Maximum sequence length: 2049, sample length: 4158 [default0]:Skipping sample id=2720251. Maximum sequence length: 2049, sample length: 3526 [default0]:Skipping sample id=2736250. Maximum sequence length: 2049, sample length: 3951 [default0]:Skipping sample id=2723155. Maximum sequence length: 2049, sample length: 4256 [default0]:Skipping sample id=2725534. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2725115. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2731807. Maximum sequence length: 2049, sample length: 3270 [default0]:Skipping sample id=2754602. Maximum sequence length: 2049, sample length: 3738 [default0]:Skipping sample id=2755814. Maximum sequence length: 2049, sample length: 3511 [default0]:Skipping sample id=2720561. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2717195. Maximum sequence length: 2049, sample length: 6438 [default0]:Skipping sample id=2747757. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2738016. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2720518. Maximum sequence length: 2049, sample length: 2138 [default0]:Skipping sample id=2718143. Maximum sequence length: 2049, sample length: 3408 [default0]:Skipping sample id=2725948. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2754361. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2729296. Maximum sequence length: 2049, sample length: 3602 [default0]:Skipping sample id=2728015. Maximum sequence length: 2049, sample length: 5347 [default0]:Skipping sample id=2755684. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2740181. Maximum sequence length: 2049, sample length: 3306 [default0]:Skipping sample id=2713323. Maximum sequence length: 2049, sample length: 6328 [default0]:Skipping sample id=2712489. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2716014. Maximum sequence length: 2049, sample length: 3946 [default0]:Skipping sample id=2748658. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2719721. Maximum sequence length: 2049, sample length: 2330 [default0]:Skipping sample id=2721481. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2497302. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2722926. Maximum sequence length: 2049, sample length: 3722 [default0]:Skipping sample id=2714778. Maximum sequence length: 2049, sample length: 3254 [default0]:Skipping sample id=2727265. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2736095. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2713525. Maximum sequence length: 2049, sample length: 2553 [default0]:Skipping sample id=2716415. Maximum sequence length: 2049, sample length: 4514 [default0]:Skipping sample id=2743563. Maximum sequence length: 2049, sample length: 3935 [default0]:Skipping sample id=2737717. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2484158. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2712295. Maximum sequence length: 2049, sample length: 2452 [default0]:Skipping sample id=2755963. Maximum sequence length: 2049, sample length: 3267 [default0]:Skipping sample id=2749815. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2714858. Maximum sequence length: 2049, sample length: 3098 [default0]:Skipping sample id=2724676. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2722286. Maximum sequence length: 2049, sample length: 3178 [default0]:Skipping sample id=2727635. Maximum sequence length: 2049, sample length: 2147 [default0]:Skipping sample id=2467535. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2742058. Maximum sequence length: 2049, sample length: 4909 [default0]:Skipping sample id=2727960. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2727728. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2486910. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2478129. Maximum sequence length: 2049, sample length: 3608 [default0]:Skipping sample id=2733480. Maximum sequence length: 2049, sample length: 3813 [default0]:Skipping sample id=2740016. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2728927. Maximum sequence length: 2049, sample length: 2982 [default0]:Skipping sample id=2743039. Maximum sequence length: 2049, sample length: 3363 [default0]:Skipping sample id=2479090. Maximum sequence length: 2049, sample length: 2304 [default0]:Skipping sample id=2736718. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2752955. Maximum sequence length: 2049, sample length: 3954 [default0]:Skipping sample id=2742716. Maximum sequence length: 2049, sample length: 3115 [default0]:Skipping sample id=2717302. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2749611. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2742959. Maximum sequence length: 2049, sample length: 3151 [default0]:Skipping sample id=2748379. Maximum sequence length: 2049, sample length: 4862 [default0]:Skipping sample id=2738705. Maximum sequence length: 2049, sample length: 3729 [default0]:Skipping sample id=2733724. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2731380. Maximum sequence length: 2049, sample length: 4278 [default0]:Skipping sample id=2498450. Maximum sequence length: 2049, sample length: 2355 [default0]:Skipping sample id=2487540. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2727325. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2734777. Maximum sequence length: 2049, sample length: 3036 [default0]:Skipping sample id=2713245. Maximum sequence length: 2049, sample length: 3534 [default0]:Skipping sample id=2490069. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2743287. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2742432. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2726985. Maximum sequence length: 2049, sample length: 3469 [default0]:Skipping sample id=2715311. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2747953. Maximum sequence length: 2049, sample length: 2842 [default0]:Skipping sample id=2745531. Maximum sequence length: 2049, sample length: 3717 [default0]:Skipping sample id=2735507. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2468031. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2728050. Maximum sequence length: 2049, sample length: 3820 [default0]:Skipping sample id=2732674. Maximum sequence length: 2049, sample length: 3796 [default0]:Skipping sample id=2751902. Maximum sequence length: 2049, sample length: 3562 [default0]:Skipping sample id=2727226. Maximum sequence length: 2049, sample length: 2841 [default0]:Skipping sample id=2731144. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2498612. Maximum sequence length: 2049, sample length: 2532 [default0]:Skipping sample id=2730867. Maximum sequence length: 2049, sample length: 2385 [default0]:Skipping sample id=2736715. Maximum sequence length: 2049, sample length: 2502 [default0]:Skipping sample id=2721557. Maximum sequence length: 2049, sample length: 2975 [default0]:Skipping sample id=2477316. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2745896. Maximum sequence length: 2049, sample length: 3440 [default0]:Skipping sample id=2742832. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2722377. Maximum sequence length: 2049, sample length: 2766 [default0]:Skipping sample id=2711730. Maximum sequence length: 2049, sample length: 5312 [default0]:Skipping sample id=2736346. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2712663. Maximum sequence length: 2049, sample length: 3830 [default0]:Skipping sample id=2730447. Maximum sequence length: 2049, sample length: 2531 [default0]:Skipping sample id=2719567. Maximum sequence length: 2049, sample length: 3802 [default0]:Skipping sample id=2754980. Maximum sequence length: 2049, sample length: 3635 [default0]:Skipping sample id=2723395. Maximum sequence length: 2049, sample length: 2730 [default0]:Skipping sample id=2715215. Maximum sequence length: 2049, sample length: 3434 [default0]:Skipping sample id=2729311. Maximum sequence length: 2049, sample length: 3495 [default0]:Skipping sample id=2485296. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2717602. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2711843. Maximum sequence length: 2049, sample length: 3775 [default0]:Skipping sample id=2718271. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2490458. Maximum sequence length: 2049, sample length: 2729 [default0]:Skipping sample id=2732943. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2722629. Maximum sequence length: 2049, sample length: 2889 [default0]:Skipping sample id=2733989. Maximum sequence length: 2049, sample length: 3327 [default0]:Skipping sample id=2736239. Maximum sequence length: 2049, sample length: 8151 [default0]:Skipping sample id=2715293. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2749727. Maximum sequence length: 2049, sample length: 2705 [default0]:Skipping sample id=2726947. Maximum sequence length: 2049, sample length: 4957 [default0]:Skipping sample id=2729823. Maximum sequence length: 2049, sample length: 4709 [default0]:Skipping sample id=2483148. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2752872. Maximum sequence length: 2049, sample length: 2878 [default0]:Skipping sample id=2725188. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2753046. Maximum sequence length: 2049, sample length: 3334 [default0]:Skipping sample id=2730095. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2745298. Maximum sequence length: 2049, sample length: 3492 [default0]:Skipping sample id=2711448. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2733581. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2481280. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2719079. Maximum sequence length: 2049, sample length: 2681 [default0]:Skipping sample id=2755947. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2754856. Maximum sequence length: 2049, sample length: 2703 [default0]:Skipping sample id=2729826. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2741745. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2498930. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2754811. Maximum sequence length: 2049, sample length: 5027 [default0]:Skipping sample id=2753031. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2738531. Maximum sequence length: 2049, sample length: 3507 [default0]:Skipping sample id=2746363. Maximum sequence length: 2049, sample length: 4525 [default0]:Skipping sample id=2736399. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2491368. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2727175. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2736102. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2750276. Maximum sequence length: 2049, sample length: 2324 [default0]:Skipping sample id=2715861. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2751940. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2743250. Maximum sequence length: 2049, sample length: 4855 [default0]:Skipping sample id=2478535. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2733964. Maximum sequence length: 2049, sample length: 3359 [default0]:Skipping sample id=2755508. Maximum sequence length: 2049, sample length: 3637 [default0]:Skipping sample id=2736525. Maximum sequence length: 2049, sample length: 4883 [default0]:Skipping sample id=2718088. Maximum sequence length: 2049, sample length: 3745 [default0]:Skipping sample id=2722174. Maximum sequence length: 2049, sample length: 3133 [default0]:Skipping sample id=2743565. Maximum sequence length: 2049, sample length: 2654 [default0]:Skipping sample id=2750064. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2756272. Maximum sequence length: 2049, sample length: 2639 [default0]:Skipping sample id=2738375. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2721696. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2736233. Maximum sequence length: 2049, sample length: 3597 [default0]:Skipping sample id=2740918. Maximum sequence length: 2049, sample length: 3168 [default0]:Skipping sample id=2734344. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2730564. Maximum sequence length: 2049, sample length: 2441 [default0]:Skipping sample id=2744824. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2720117. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2727585. Maximum sequence length: 2049, sample length: 2924 [default0]:Skipping sample id=2729066. Maximum sequence length: 2049, sample length: 3136 [default0]:Skipping sample id=2495498. Maximum sequence length: 2049, sample length: 2829 [default0]:Skipping sample id=2753090. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2732198. Maximum sequence length: 2049, sample length: 3724 [default0]:Skipping sample id=2753943. Maximum sequence length: 2049, sample length: 5621 [default0]:Skipping sample id=2728660. Maximum sequence length: 2049, sample length: 4085 [default0]:Skipping sample id=2739390. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2479028. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2749304. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2711507. Maximum sequence length: 2049, sample length: 4691 [default0]:Skipping sample id=2736806. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2737992. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2722602. Maximum sequence length: 2049, sample length: 4926 [default0]:Skipping sample id=2730422. Maximum sequence length: 2049, sample length: 3229 [default0]:Skipping sample id=2744307. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2719021. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2482269. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2745977. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2725850. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2712426. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2750748. Maximum sequence length: 2049, sample length: 3579 [default0]:Skipping sample id=2733951. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2756205. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2744523. Maximum sequence length: 2049, sample length: 3381 [default0]:Skipping sample id=2756781. Maximum sequence length: 2049, sample length: 3255 [default0]:Skipping sample id=2751512. Maximum sequence length: 2049, sample length: 3627 [default0]:Skipping sample id=2716336. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2744163. Maximum sequence length: 2049, sample length: 3379 [default0]:Skipping sample id=2714626. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2711075. Maximum sequence length: 2049, sample length: 4782 [default0]:Skipping sample id=2487988. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2478051. Maximum sequence length: 2049, sample length: 2564 [default0]:Skipping sample id=2756666. Maximum sequence length: 2049, sample length: 4126 [default0]:Skipping sample id=2720606. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2742846. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2745195. Maximum sequence length: 2049, sample length: 3363 [default0]:Skipping sample id=2718823. Maximum sequence length: 2049, sample length: 3032 [default0]:Skipping sample id=2751219. Maximum sequence length: 2049, sample length: 4010 [default0]:Skipping sample id=2747945. Maximum sequence length: 2049, sample length: 3698 [default0]:Skipping sample id=2487010. Maximum sequence length: 2049, sample length: 2279 [default0]:Skipping sample id=2740576. Maximum sequence length: 2049, sample length: 3261 [default0]:Skipping sample id=2712921. Maximum sequence length: 2049, sample length: 2056 [default0]:Skipping sample id=2756305. Maximum sequence length: 2049, sample length: 3174 [default0]:Skipping sample id=2726320. Maximum sequence length: 2049, sample length: 3198 [default0]:Skipping sample id=2716891. Maximum sequence length: 2049, sample length: 2608 [default0]:Skipping sample id=2481492. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2727044. Maximum sequence length: 2049, sample length: 2876 [default0]:Skipping sample id=2747275. Maximum sequence length: 2049, sample length: 4628 [default0]:Skipping sample id=2717322. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2746849. Maximum sequence length: 2049, sample length: 3063 [default0]:Skipping sample id=2755369. Maximum sequence length: 2049, sample length: 5412 [default0]:Skipping sample id=2489289. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2744714. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2722543. Maximum sequence length: 2049, sample length: 2319 [default0]:Skipping sample id=2716472. Maximum sequence length: 2049, sample length: 3845 [default0]:Skipping sample id=2743322. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2731683. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2728728. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2712844. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2734595. Maximum sequence length: 2049, sample length: 3653 [default0]:Skipping sample id=2746567. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2740325. Maximum sequence length: 2049, sample length: 3800 [default0]:Skipping sample id=2729166. Maximum sequence length: 2049, sample length: 5111 [default0]:Skipping sample id=2748118. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2713922. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2755319. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2490435. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2736166. Maximum sequence length: 2049, sample length: 2376 [default0]:Skipping sample id=2731151. Maximum sequence length: 2049, sample length: 2361 [default0]:Skipping sample id=2711740. Maximum sequence length: 2049, sample length: 2367 [default0]:Skipping sample id=2754347. Maximum sequence length: 2049, sample length: 3006 [default0]:Skipping sample id=2731721. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2748220. Maximum sequence length: 2049, sample length: 2555 [default0]:Skipping sample id=2755666. Maximum sequence length: 2049, sample length: 2834 [default0]:Skipping sample id=2740271. Maximum sequence length: 2049, sample length: 2994 [default0]:Skipping sample id=2732131. Maximum sequence length: 2049, sample length: 3997 [default0]:Skipping sample id=2745811. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2742260. Maximum sequence length: 2049, sample length: 5012 [default0]:Skipping sample id=2468481. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2730545. Maximum sequence length: 2049, sample length: 5974 [default0]:Skipping sample id=2733032. Maximum sequence length: 2049, sample length: 2772 [default0]:Skipping sample id=2498247. Maximum sequence length: 2049, sample length: 2136 [default0]:Skipping sample id=2734885. Maximum sequence length: 2049, sample length: 2673 [default0]:Skipping sample id=2747469. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2735381. Maximum sequence length: 2049, sample length: 3870 [default0]:Skipping sample id=2482683. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2754720. Maximum sequence length: 2049, sample length: 2587 [default0]:Skipping sample id=2490797. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2711623. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2714932. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2728042. Maximum sequence length: 2049, sample length: 2846 [default0]:Skipping sample id=2476967. Maximum sequence length: 2049, sample length: 2561 [default0]:Skipping sample id=2742795. Maximum sequence length: 2049, sample length: 3547 [default0]:Skipping sample id=2728053. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2756041. Maximum sequence length: 2049, sample length: 3081 [default0]:Skipping sample id=2479929. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2748274. Maximum sequence length: 2049, sample length: 2737 [default0]:Skipping sample id=2741796. Maximum sequence length: 2049, sample length: 2642 [default0]:Skipping sample id=2728091. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2493227. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2729628. Maximum sequence length: 2049, sample length: 2450 [default0]:Skipping sample id=2723945. Maximum sequence length: 2049, sample length: 3019 [default0]:Skipping sample id=2467838. Maximum sequence length: 2049, sample length: 2157 [default0]:Skipping sample id=2718272. Maximum sequence length: 2049, sample length: 2895 [default0]:Skipping sample id=2749591. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2482895. Maximum sequence length: 2049, sample length: 3076 [default0]:Skipping sample id=2741875. Maximum sequence length: 2049, sample length: 4783 [default0]:Skipping sample id=2738119. Maximum sequence length: 2049, sample length: 2734 [default0]:Skipping sample id=2712992. Maximum sequence length: 2049, sample length: 5381 [default0]:Skipping sample id=2725068. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2484050. Maximum sequence length: 2049, sample length: 2388 [default0]:Skipping sample id=2732434. Maximum sequence length: 2049, sample length: 3871 [default0]:Skipping sample id=2716494. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2713157. Maximum sequence length: 2049, sample length: 2657 [default0]:Skipping sample id=2712243. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2468581. Maximum sequence length: 2049, sample length: 2762 [default0]:Skipping sample id=2736651. Maximum sequence length: 2049, sample length: 2396 [default0]:Skipping sample id=2752195. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2749814. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2485574. Maximum sequence length: 2049, sample length: 2399 [default0]:Skipping sample id=2739093. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2724697. Maximum sequence length: 2049, sample length: 2169 [default0]:Skipping sample id=2718854. Maximum sequence length: 2049, sample length: 3687 [default0]:Skipping sample id=2479049. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2729095. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2712450. Maximum sequence length: 2049, sample length: 2534 [default0]:Skipping sample id=2723788. Maximum sequence length: 2049, sample length: 3086 [default0]:Skipping sample id=2730423. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2712286. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2740448. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2717618. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2754233. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2756575. Maximum sequence length: 2049, sample length: 2292 [default0]:Skipping sample id=2712655. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2483433. Maximum sequence length: 2049, sample length: 2484 [default0]:Skipping sample id=2730113. Maximum sequence length: 2049, sample length: 4283 [default0]:Skipping sample id=2739738. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2734273. Maximum sequence length: 2049, sample length: 5573 [default0]:Skipping sample id=2747527. Maximum sequence length: 2049, sample length: 3438 [default0]:Skipping sample id=2484880. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2746782. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2726382. Maximum sequence length: 2049, sample length: 3264 [default0]:Skipping sample id=2746503. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2743679. Maximum sequence length: 2049, sample length: 2544 [default0]:Skipping sample id=2726022. Maximum sequence length: 2049, sample length: 2378 [default0]:Skipping sample id=2727386. Maximum sequence length: 2049, sample length: 3561 [default0]:Skipping sample id=2499400. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2745044. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2745735. Maximum sequence length: 2049, sample length: 4593 [default0]:Skipping sample id=2714072. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2750793. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2743348. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2741677. Maximum sequence length: 2049, sample length: 3905 [default0]:Skipping sample id=2747782. Maximum sequence length: 2049, sample length: 2187 [default0]:Skipping sample id=2483538. Maximum sequence length: 2049, sample length: 2423 [default0]:Skipping sample id=2733144. Maximum sequence length: 2049, sample length: 5347 [default0]:Skipping sample id=2718821. Maximum sequence length: 2049, sample length: 5125 [default0]:Skipping sample id=2724123. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2722569. Maximum sequence length: 2049, sample length: 2986 [default0]:Skipping sample id=2747615. Maximum sequence length: 2049, sample length: 2400 [default0]:Skipping sample id=2731949. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2740063. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2716135. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2489244. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2747655. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2725645. Maximum sequence length: 2049, sample length: 3455 [default0]:Skipping sample id=2747478. Maximum sequence length: 2049, sample length: 2890 [default0]:Skipping sample id=2726740. Maximum sequence length: 2049, sample length: 3858 [default0]:Skipping sample id=2488721. Maximum sequence length: 2049, sample length: 2815 [default0]:Skipping sample id=2719605. Maximum sequence length: 2049, sample length: 3856 [default0]:Skipping sample id=2499316. Maximum sequence length: 2049, sample length: 2134 [default0]:Skipping sample id=2735940. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2724732. Maximum sequence length: 2049, sample length: 2373 [default0]:Skipping sample id=2730732. Maximum sequence length: 2049, sample length: 2961 [default0]:Skipping sample id=2722734. Maximum sequence length: 2049, sample length: 5093 [default0]:Skipping sample id=2727146. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2731056. Maximum sequence length: 2049, sample length: 3259 [default0]:Skipping sample id=2711712. Maximum sequence length: 2049, sample length: 2550 [default0]:Skipping sample id=2723386. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2725612. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2757060. Maximum sequence length: 2049, sample length: 2524 [default0]:Skipping sample id=2496242. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2729392. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2753509. Maximum sequence length: 2049, sample length: 2496 [default0]:Skipping sample id=2488774. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2748249. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2484721. Maximum sequence length: 2049, sample length: 2086 [default0]:Skipping sample id=2717060. Maximum sequence length: 2049, sample length: 3757 [default0]:Skipping sample id=2719999. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2498139. Maximum sequence length: 2049, sample length: 2346 [default0]:Skipping sample id=2489853. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2716296. Maximum sequence length: 2049, sample length: 2406 [default0]:Skipping sample id=2713731. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2745723. Maximum sequence length: 2049, sample length: 2934 [default0]:Skipping sample id=2483956. Maximum sequence length: 2049, sample length: 2296 [default0]:Skipping sample id=2497897. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2752184. Maximum sequence length: 2049, sample length: 3105 [default0]:Skipping sample id=2719741. Maximum sequence length: 2049, sample length: 3400 [default0]:Skipping sample id=2467688. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2729075. Maximum sequence length: 2049, sample length: 2725 [default0]:Skipping sample id=2745218. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2755810. Maximum sequence length: 2049, sample length: 3201 [default0]:Skipping sample id=2724650. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2722101. Maximum sequence length: 2049, sample length: 3782 [default0]:Skipping sample id=2723609. Maximum sequence length: 2049, sample length: 3712 [default0]:Skipping sample id=2482286. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2734519. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2723493. Maximum sequence length: 2049, sample length: 4448 [default0]:Skipping sample id=2752035. Maximum sequence length: 2049, sample length: 3557 [default0]:Skipping sample id=2751911. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2478219. Maximum sequence length: 2049, sample length: 2859 [default0]:Skipping sample id=2735020. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2754194. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2732850. Maximum sequence length: 2049, sample length: 2401 [default0]:Skipping sample id=2724349. Maximum sequence length: 2049, sample length: 5320 [default0]:Skipping sample id=2731276. Maximum sequence length: 2049, sample length: 3867 [default0]:Skipping sample id=2477435. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2748376. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2487660. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2734629. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2748063. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2748114. Maximum sequence length: 2049, sample length: 5179 [default0]:Skipping sample id=2738726. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2488376. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2743232. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2712567. Maximum sequence length: 2049, sample length: 3469 [default0]:Skipping sample id=2468401. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2730633. Maximum sequence length: 2049, sample length: 2555 [default0]:Skipping sample id=2481311. Maximum sequence length: 2049, sample length: 3670 [default0]:Skipping sample id=2717753. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2740237. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2749648. Maximum sequence length: 2049, sample length: 2087 [default0]:Skipping sample id=2736025. Maximum sequence length: 2049, sample length: 5153 [default0]:Skipping sample id=2739703. Maximum sequence length: 2049, sample length: 2217 [default0]:Skipping sample id=2755217. Maximum sequence length: 2049, sample length: 2231 [default0]:Skipping sample id=2736355. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2727652. Maximum sequence length: 2049, sample length: 3572 [default0]:Skipping sample id=2757076. Maximum sequence length: 2049, sample length: 3343 [default0]:Skipping sample id=2735150. Maximum sequence length: 2049, sample length: 3091 [default0]:Skipping sample id=2722653. Maximum sequence length: 2049, sample length: 3291 [default0]:Skipping sample id=2734776. Maximum sequence length: 2049, sample length: 3483 [default0]:Skipping sample id=2719868. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2731067. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2477035. Maximum sequence length: 2049, sample length: 2171 [default0]:Skipping sample id=2756659. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2745851. Maximum sequence length: 2049, sample length: 7555 [default0]:Skipping sample id=2741637. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2466820. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2739758. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2751897. Maximum sequence length: 2049, sample length: 3387 [default0]:Skipping sample id=2732379. Maximum sequence length: 2049, sample length: 2336 [default0]:Skipping sample id=2715388. Maximum sequence length: 2049, sample length: 6431 [default0]:Skipping sample id=2735835. Maximum sequence length: 2049, sample length: 2287 [default0]:Skipping sample id=2717403. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2727116. Maximum sequence length: 2049, sample length: 3497 [default0]:Skipping sample id=2745702. Maximum sequence length: 2049, sample length: 2755 [default0]:Skipping sample id=2485706. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2754711. Maximum sequence length: 2049, sample length: 3042 [default0]:Skipping sample id=2724747. Maximum sequence length: 2049, sample length: 7624 [default0]:Skipping sample id=2729817. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2746521. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2717687. Maximum sequence length: 2049, sample length: 4334 [default0]:Skipping sample id=2754719. Maximum sequence length: 2049, sample length: 2817 [default0]:Skipping sample id=2738161. Maximum sequence length: 2049, sample length: 2700 [default0]:Skipping sample id=2728191. Maximum sequence length: 2049, sample length: 3208 [default0]:Skipping sample id=2738103. Maximum sequence length: 2049, sample length: 2185 [default0]:Skipping sample id=2466883. Maximum sequence length: 2049, sample length: 2553 [default0]:Skipping sample id=2490074. Maximum sequence length: 2049, sample length: 2879 [default0]:Skipping sample id=2719800. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2752920. Maximum sequence length: 2049, sample length: 2724 [default0]:Skipping sample id=2738960. Maximum sequence length: 2049, sample length: 2999 [default0]:Skipping sample id=2477075. Maximum sequence length: 2049, sample length: 2271 [default0]:Skipping sample id=2478504. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2743652. Maximum sequence length: 2049, sample length: 3024 [default0]:Skipping sample id=2716958. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2468133. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2736367. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2720021. Maximum sequence length: 2049, sample length: 3589 [default0]:Skipping sample id=2723760. Maximum sequence length: 2049, sample length: 2395 [default0]:Skipping sample id=2753969. Maximum sequence length: 2049, sample length: 3148 [default0]:Skipping sample id=2732628. Maximum sequence length: 2049, sample length: 5133 [default0]:Skipping sample id=2711715. Maximum sequence length: 2049, sample length: 3243 [default0]:Skipping sample id=2713441. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2740704. Maximum sequence length: 2049, sample length: 4958 [default0]:Skipping sample id=2729758. Maximum sequence length: 2049, sample length: 2930 [default0]:Skipping sample id=2735993. Maximum sequence length: 2049, sample length: 2126 [default0]:Skipping sample id=2724912. Maximum sequence length: 2049, sample length: 2701 [default0]:Skipping sample id=2726869. Maximum sequence length: 2049, sample length: 3452 [default0]:Skipping sample id=2718035. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2736761. Maximum sequence length: 2049, sample length: 3828 [default0]:Skipping sample id=2749897. Maximum sequence length: 2049, sample length: 8478 [default0]:Skipping sample id=2726441. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2715240. Maximum sequence length: 2049, sample length: 2732 [default0]:Skipping sample id=2713412. Maximum sequence length: 2049, sample length: 3026 [default0]:Skipping sample id=2746413. Maximum sequence length: 2049, sample length: 2795 [default0]:Skipping sample id=2753235. Maximum sequence length: 2049, sample length: 5808 [default0]:Skipping sample id=2721137. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2751038. Maximum sequence length: 2049, sample length: 2219 [default0]:Skipping sample id=2711826. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2743069. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2732550. Maximum sequence length: 2049, sample length: 7261 [default0]:Skipping sample id=2726720. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2497207. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2748263. Maximum sequence length: 2049, sample length: 4872 [default0]:Skipping sample id=2750597. Maximum sequence length: 2049, sample length: 3211 [default0]:Skipping sample id=2715643. Maximum sequence length: 2049, sample length: 3426 [default0]:Skipping sample id=2726925. Maximum sequence length: 2049, sample length: 3633 [default0]:Skipping sample id=2756712. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2735242. Maximum sequence length: 2049, sample length: 4283 [default0]:Skipping sample id=2717187. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2730466. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2488815. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2492993. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2735528. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2725025. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2713319. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2495710. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2732973. Maximum sequence length: 2049, sample length: 2286 [default0]:Skipping sample id=2746431. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2711354. Maximum sequence length: 2049, sample length: 3798 [default0]:Skipping sample id=2726559. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2740122. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2737811. Maximum sequence length: 2049, sample length: 2195 [default0]:Skipping sample id=2741638. Maximum sequence length: 2049, sample length: 4120 [default0]:Skipping sample id=2748900. Maximum sequence length: 2049, sample length: 2224 [default0]:Skipping sample id=2483094. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2720380. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2730068. Maximum sequence length: 2049, sample length: 3686 [default0]:Skipping sample id=2749681. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2731600. Maximum sequence length: 2049, sample length: 4059 [default0]:Skipping sample id=2483948. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2734263. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2750888. Maximum sequence length: 2049, sample length: 3280 [default0]:Skipping sample id=2711203. Maximum sequence length: 2049, sample length: 3509 [default0]:Skipping sample id=2751292. Maximum sequence length: 2049, sample length: 2068 [default0]:Skipping sample id=2742392. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2484724. Maximum sequence length: 2049, sample length: 3471 [default0]:Skipping sample id=2746824. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2737759. Maximum sequence length: 2049, sample length: 5315 [default0]:Skipping sample id=2739593. Maximum sequence length: 2049, sample length: 3628 [default0]:Skipping sample id=2715269. Maximum sequence length: 2049, sample length: 3844 [default0]:Skipping sample id=2740074. Maximum sequence length: 2049, sample length: 6239 [default0]:Skipping sample id=2732062. Maximum sequence length: 2049, sample length: 3587 [default0]:Skipping sample id=2730372. Maximum sequence length: 2049, sample length: 3437 [default0]:Skipping sample id=2740937. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2738996. Maximum sequence length: 2049, sample length: 3575 [default0]:Skipping sample id=2740427. Maximum sequence length: 2049, sample length: 3504 [default0]:Skipping sample id=2722359. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2725604. Maximum sequence length: 2049, sample length: 2728 [default0]:Skipping sample id=2713313. Maximum sequence length: 2049, sample length: 3732 [default0]:Skipping sample id=2750503. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2750144. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2487894. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2739439. Maximum sequence length: 2049, sample length: 2228 [default0]:Skipping sample id=2743118. Maximum sequence length: 2049, sample length: 4687 [default0]:Skipping sample id=2756492. Maximum sequence length: 2049, sample length: 4086 [default0]:Skipping sample id=2749753. Maximum sequence length: 2049, sample length: 2965 [default0]:Skipping sample id=2494992. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2741288. Maximum sequence length: 2049, sample length: 3262 [default0]:Skipping sample id=2740908. Maximum sequence length: 2049, sample length: 3517 [default0]:Skipping sample id=2748099. Maximum sequence length: 2049, sample length: 2982 [default0]:Skipping sample id=2735729. Maximum sequence length: 2049, sample length: 3823 [default0]:Skipping sample id=2735594. Maximum sequence length: 2049, sample length: 2176 [default0]:Skipping sample id=2744642. Maximum sequence length: 2049, sample length: 2941 [default0]:Skipping sample id=2723755. Maximum sequence length: 2049, sample length: 2258 [default0]:Skipping sample id=2468264. Maximum sequence length: 2049, sample length: 3548 [default0]:Skipping sample id=2729813. Maximum sequence length: 2049, sample length: 3521 [default0]:Skipping sample id=2724025. Maximum sequence length: 2049, sample length: 2902 [default0]:Skipping sample id=2712755. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2726879. Maximum sequence length: 2049, sample length: 4036 [default0]:Skipping sample id=2736765. Maximum sequence length: 2049, sample length: 2661 [default0]:Skipping sample id=2755654. Maximum sequence length: 2049, sample length: 4403 [default0]:Skipping sample id=2714480. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2481277. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2744001. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2724513. Maximum sequence length: 2049, sample length: 4308 [default0]:Skipping sample id=2750629. Maximum sequence length: 2049, sample length: 4232 [default0]:Skipping sample id=2477180. Maximum sequence length: 2049, sample length: 3596 [default0]:Skipping sample id=2725162. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2744240. Maximum sequence length: 2049, sample length: 3601 [default0]:Skipping sample id=2715647. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2728051. Maximum sequence length: 2049, sample length: 2161 [default0]:Skipping sample id=2722113. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2499346. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2751473. Maximum sequence length: 2049, sample length: 3816 [default0]:Skipping sample id=2742634. Maximum sequence length: 2049, sample length: 4931 [default0]:Skipping sample id=2746100. Maximum sequence length: 2049, sample length: 2777 [default0]:Skipping sample id=2739353. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2721583. Maximum sequence length: 2049, sample length: 4445 [default0]:Skipping sample id=2735864. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2483361. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2747159. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2738968. Maximum sequence length: 2049, sample length: 6158 [default0]:Skipping sample id=2486127. Maximum sequence length: 2049, sample length: 2433 [default0]:Skipping sample id=2712730. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2736958. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2757004. Maximum sequence length: 2049, sample length: 2679 [default0]:Skipping sample id=2728702. Maximum sequence length: 2049, sample length: 3154 [default0]:Skipping sample id=2717463. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2733705. Maximum sequence length: 2049, sample length: 6481 [default0]:Skipping sample id=2748714. Maximum sequence length: 2049, sample length: 2852 [default0]:Skipping sample id=2494190. Maximum sequence length: 2049, sample length: 2379 [default0]:Skipping sample id=2493900. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2711860. Maximum sequence length: 2049, sample length: 5135 [default0]:Skipping sample id=2736627. Maximum sequence length: 2049, sample length: 3316 [default0]:Skipping sample id=2718860. Maximum sequence length: 2049, sample length: 2103 [default0]:Skipping sample id=2740848. Maximum sequence length: 2049, sample length: 2259 [default0]:Skipping sample id=2736452. Maximum sequence length: 2049, sample length: 4552 [default0]:Skipping sample id=2718028. Maximum sequence length: 2049, sample length: 4326 [default0]:Skipping sample id=2714471. Maximum sequence length: 2049, sample length: 3613 [default0]:Skipping sample id=2741077. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2470567. Maximum sequence length: 2049, sample length: 2246 [default0]:Skipping sample id=2749994. Maximum sequence length: 2049, sample length: 5083 [default0]:Skipping sample id=2728275. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2715119. Maximum sequence length: 2049, sample length: 5610 [default0]:Skipping sample id=2734048. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2713674. Maximum sequence length: 2049, sample length: 6810 [default0]:Skipping sample id=2741856. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2744873. Maximum sequence length: 2049, sample length: 2800 [default0]:Skipping sample id=2755494. Maximum sequence length: 2049, sample length: 5055 [default0]:Skipping sample id=2751778. Maximum sequence length: 2049, sample length: 2602 [default0]:Skipping sample id=2720207. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2751793. Maximum sequence length: 2049, sample length: 3373 [default0]:Skipping sample id=2756091. Maximum sequence length: 2049, sample length: 4964 [default0]:Skipping sample id=2725293. Maximum sequence length: 2049, sample length: 3246 [default0]:Skipping sample id=2756276. Maximum sequence length: 2049, sample length: 2168 [default0]:Skipping sample id=2731447. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2494601. Maximum sequence length: 2049, sample length: 2247 [default0]:Skipping sample id=2716697. Maximum sequence length: 2049, sample length: 3870 [default0]:Skipping sample id=2731306. Maximum sequence length: 2049, sample length: 2940 [default0]:Skipping sample id=2756778. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2726246. Maximum sequence length: 2049, sample length: 2551 [default0]:Skipping sample id=2718106. Maximum sequence length: 2049, sample length: 3279 [default0]:Skipping sample id=2715496. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2710959. Maximum sequence length: 2049, sample length: 2663 [default0]:Skipping sample id=2744677. Maximum sequence length: 2049, sample length: 2582 [default0]:Skipping sample id=2485953. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2720558. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2736890. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2727649. Maximum sequence length: 2049, sample length: 4251 [default0]:Skipping sample id=2731598. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2727707. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2724572. Maximum sequence length: 2049, sample length: 2777 [default0]:Skipping sample id=2746655. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2721882. Maximum sequence length: 2049, sample length: 3687 [default0]:Skipping sample id=2470123. Maximum sequence length: 2049, sample length: 2178 [default0]:Skipping sample id=2747917. Maximum sequence length: 2049, sample length: 3335 [default0]:Skipping sample id=2734466. Maximum sequence length: 2049, sample length: 2671 [default0]:Skipping sample id=2718126. Maximum sequence length: 2049, sample length: 2285 [default0]:Skipping sample id=2734241. Maximum sequence length: 2049, sample length: 3677 [default0]:Skipping sample id=2744340. Maximum sequence length: 2049, sample length: 2929 [default0]:Skipping sample id=2734984. Maximum sequence length: 2049, sample length: 3641 [default0]:Skipping sample id=2744674. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2712858. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2737856. Maximum sequence length: 2049, sample length: 5171 [default0]:Skipping sample id=2735571. Maximum sequence length: 2049, sample length: 2093 [default0]:Skipping sample id=2755568. Maximum sequence length: 2049, sample length: 3312 [default0]:Skipping sample id=2726952. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2717559. Maximum sequence length: 2049, sample length: 2911 [default0]:Skipping sample id=2719228. Maximum sequence length: 2049, sample length: 3030 [default0]:Skipping sample id=2743915. Maximum sequence length: 2049, sample length: 2853 [default0]:Skipping sample id=2729663. Maximum sequence length: 2049, sample length: 3258 [default0]:Skipping sample id=2713940. Maximum sequence length: 2049, sample length: 2583 [default0]:Skipping sample id=2723454. Maximum sequence length: 2049, sample length: 2682 [default0]:Skipping sample id=2483313. Maximum sequence length: 2049, sample length: 2428 [default0]:Skipping sample id=2742606. Maximum sequence length: 2049, sample length: 5458 [default0]:Skipping sample id=2738456. Maximum sequence length: 2049, sample length: 2843 [default0]:Skipping sample id=2730212. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2731746. Maximum sequence length: 2049, sample length: 3206 [default0]:Skipping sample id=2728920. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2730143. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2753755. Maximum sequence length: 2049, sample length: 2555 [default0]:Skipping sample id=2738170. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2711403. Maximum sequence length: 2049, sample length: 4430 [default0]:Skipping sample id=2749595. Maximum sequence length: 2049, sample length: 4927 [default0]:Skipping sample id=2739301. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2725799. Maximum sequence length: 2049, sample length: 3325 [default0]:Skipping sample id=2737889. Maximum sequence length: 2049, sample length: 2826 [default0]:Skipping sample id=2744503. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2713110. Maximum sequence length: 2049, sample length: 2274 [default0]:Skipping sample id=2740353. Maximum sequence length: 2049, sample length: 2463 [default0]:Skipping sample id=2471026. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2728000. Maximum sequence length: 2049, sample length: 4553 [default0]:Skipping sample id=2751620. Maximum sequence length: 2049, sample length: 3479 [default0]:Skipping sample id=2480809. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2755251. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2746686. Maximum sequence length: 2049, sample length: 2916 [default0]:Skipping sample id=2738845. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2725671. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2740032. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2735447. Maximum sequence length: 2049, sample length: 3426 [default0]:Skipping sample id=2742808. Maximum sequence length: 2049, sample length: 7067 [default0]:Skipping sample id=2711054. Maximum sequence length: 2049, sample length: 3173 [default0]:Skipping sample id=2726220. Maximum sequence length: 2049, sample length: 3256 [default0]:Skipping sample id=2735936. Maximum sequence length: 2049, sample length: 4591 [default0]:Skipping sample id=2742128. Maximum sequence length: 2049, sample length: 3653 [default0]:Skipping sample id=2467389. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2478439. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2726561. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2735213. Maximum sequence length: 2049, sample length: 4561 [default0]:Skipping sample id=2715567. Maximum sequence length: 2049, sample length: 3046 [default0]:Skipping sample id=2723058. Maximum sequence length: 2049, sample length: 3194 [default0]:Skipping sample id=2725932. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2742439. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2755819. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2734299. Maximum sequence length: 2049, sample length: 2283 [default0]:Skipping sample id=2725379. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2723518. Maximum sequence length: 2049, sample length: 3589 [default0]:Skipping sample id=2478844. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2712819. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2483635. Maximum sequence length: 2049, sample length: 3266 [default0]:Skipping sample id=2737050. Maximum sequence length: 2049, sample length: 5829 [default0]:Skipping sample id=2735899. Maximum sequence length: 2049, sample length: 3585 [default0]:Skipping sample id=2712621. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2722902. Maximum sequence length: 2049, sample length: 2089 [default0]:Skipping sample id=2720390. Maximum sequence length: 2049, sample length: 2443 [default0]:Skipping sample id=2715319. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2726820. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2738595. Maximum sequence length: 2049, sample length: 5270 [default0]:Skipping sample id=2741919. Maximum sequence length: 2049, sample length: 5045 [default0]:Skipping sample id=2747910. Maximum sequence length: 2049, sample length: 2614 [default0]:Skipping sample id=2748217. Maximum sequence length: 2049, sample length: 2736 [default0]:Skipping sample id=2757031. Maximum sequence length: 2049, sample length: 3559 [default0]:Skipping sample id=2740460. Maximum sequence length: 2049, sample length: 2838 [default0]:Skipping sample id=2713497. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2736705. Maximum sequence length: 2049, sample length: 4642 [default0]:Skipping sample id=2756829. Maximum sequence length: 2049, sample length: 3496 [default0]:Skipping sample id=2720100. Maximum sequence length: 2049, sample length: 2299 [default0]:Skipping sample id=2717843. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2745162. Maximum sequence length: 2049, sample length: 3038 [default0]:Skipping sample id=2735989. Maximum sequence length: 2049, sample length: 2595 [default0]:Skipping sample id=2743005. Maximum sequence length: 2049, sample length: 5327 [default0]:Skipping sample id=2732569. Maximum sequence length: 2049, sample length: 3214 [default0]:Skipping sample id=2714485. Maximum sequence length: 2049, sample length: 4424 [default0]:Skipping sample id=2716781. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2466430. Maximum sequence length: 2049, sample length: 2849 [default0]:Skipping sample id=2745111. Maximum sequence length: 2049, sample length: 5406 [default0]:Skipping sample id=2714928. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2740387. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2466886. Maximum sequence length: 2049, sample length: 3612 [default0]:Skipping sample id=2750022. Maximum sequence length: 2049, sample length: 8128 [default0]:Skipping sample id=2751782. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2715708. Maximum sequence length: 2049, sample length: 3139 [default0]:Skipping sample id=2751851. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2718973. Maximum sequence length: 2049, sample length: 2070 [default0]:Skipping sample id=2756322. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2739325. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2754405. Maximum sequence length: 2049, sample length: 2497 [default0]:Skipping sample id=2746454. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2750682. Maximum sequence length: 2049, sample length: 4542 [default0]:Skipping sample id=2732887. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2729246. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2737804. Maximum sequence length: 2049, sample length: 2420 [default0]:Skipping sample id=2737151. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2713401. Maximum sequence length: 2049, sample length: 2230 [default0]:Skipping sample id=2749762. Maximum sequence length: 2049, sample length: 2163 [default0]:Skipping sample id=2730947. Maximum sequence length: 2049, sample length: 2656 [default0]:Skipping sample id=2729581. Maximum sequence length: 2049, sample length: 2094 [default0]:Skipping sample id=2714489. Maximum sequence length: 2049, sample length: 7617 [default0]:Skipping sample id=2732074. Maximum sequence length: 2049, sample length: 4708 [default0]:Skipping sample id=2739625. Maximum sequence length: 2049, sample length: 2165 [default0]:Skipping sample id=2740316. Maximum sequence length: 2049, sample length: 2091 [default0]:Skipping sample id=2494251. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2715849. Maximum sequence length: 2049, sample length: 2887 [default0]:Skipping sample id=2728957. Maximum sequence length: 2049, sample length: 3968 [default0]:Skipping sample id=2715679. Maximum sequence length: 2049, sample length: 3056 [default0]:Skipping sample id=2736643. Maximum sequence length: 2049, sample length: 2445 [default0]:Skipping sample id=2748709. Maximum sequence length: 2049, sample length: 4025 [default0]:Skipping sample id=2477874. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2469788. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2487102. Maximum sequence length: 2049, sample length: 2703 [default0]:Skipping sample id=2721960. Maximum sequence length: 2049, sample length: 2363 [default0]:Skipping sample id=2754050. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2731436. Maximum sequence length: 2049, sample length: 2907 [default0]:Skipping sample id=2756102. Maximum sequence length: 2049, sample length: 2432 [default0]:Skipping sample id=2713182. Maximum sequence length: 2049, sample length: 4030 [default0]:Skipping sample id=2727516. Maximum sequence length: 2049, sample length: 2683 [default0]:Skipping sample id=2489487. Maximum sequence length: 2049, sample length: 2648 [default0]:Skipping sample id=2720985. Maximum sequence length: 2049, sample length: 3434 [default0]:Skipping sample id=2747695. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2717592. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2715420. Maximum sequence length: 2049, sample length: 4601 [default0]:Skipping sample id=2741335. Maximum sequence length: 2049, sample length: 2637 [default0]:Skipping sample id=2724228. Maximum sequence length: 2049, sample length: 2504 [default0]:Skipping sample id=2747216. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2714163. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2713904. Maximum sequence length: 2049, sample length: 3041 [default0]:Skipping sample id=2716752. Maximum sequence length: 2049, sample length: 3389 [default0]:Skipping sample id=2717519. Maximum sequence length: 2049, sample length: 2687 [default0]:Skipping sample id=2727478. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2486653. Maximum sequence length: 2049, sample length: 2523 [default0]:Skipping sample id=2741524. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2468572. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2723587. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2733374. Maximum sequence length: 2049, sample length: 2108 [default0]:Skipping sample id=2731424. Maximum sequence length: 2049, sample length: 3292 [default0]:Skipping sample id=2727303. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2739716. Maximum sequence length: 2049, sample length: 2235 [default0]:Skipping sample id=2741753. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2490502. Maximum sequence length: 2049, sample length: 2832 [default0]:Skipping sample id=2733168. Maximum sequence length: 2049, sample length: 4792 [default0]:Skipping sample id=2495398. Maximum sequence length: 2049, sample length: 2787 [default0]:Skipping sample id=2746969. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2748306. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2739337. Maximum sequence length: 2049, sample length: 4920 [default0]:Skipping sample id=2751671. Maximum sequence length: 2049, sample length: 2474 [default0]:Skipping sample id=2714605. Maximum sequence length: 2049, sample length: 3084 [default0]:Skipping sample id=2713060. Maximum sequence length: 2049, sample length: 3832 [default0]:Skipping sample id=2720044. Maximum sequence length: 2049, sample length: 2708 [default0]:Skipping sample id=2719573. Maximum sequence length: 2049, sample length: 3021 [default0]:Skipping sample id=2752372. Maximum sequence length: 2049, sample length: 6493 [default0]:Skipping sample id=2741690. Maximum sequence length: 2049, sample length: 4689 [default0]:Skipping sample id=2752313. Maximum sequence length: 2049, sample length: 2468 [default0]:Skipping sample id=2742652. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2714119. Maximum sequence length: 2049, sample length: 8234 [default0]:Skipping sample id=2733344. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2751055. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2717224. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2718361. Maximum sequence length: 2049, sample length: 2905 [default0]:Skipping sample id=2749398. Maximum sequence length: 2049, sample length: 3547 [default0]:Skipping sample id=2755738. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2751143. Maximum sequence length: 2049, sample length: 2641 [default0]:Skipping sample id=2751877. Maximum sequence length: 2049, sample length: 2718 [default0]:Skipping sample id=2755457. Maximum sequence length: 2049, sample length: 2767 [default0]:Skipping sample id=2484091. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2748027. Maximum sequence length: 2049, sample length: 2562 [default0]:Skipping sample id=2726172. Maximum sequence length: 2049, sample length: 3704 [default0]:Skipping sample id=2718177. Maximum sequence length: 2049, sample length: 3143 [default0]:Skipping sample id=2713208. Maximum sequence length: 2049, sample length: 2691 [default0]:Skipping sample id=2736859. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2731510. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2749409. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2722036. Maximum sequence length: 2049, sample length: 2471 [default0]:Skipping sample id=2713885. Maximum sequence length: 2049, sample length: 4152 [default0]:Skipping sample id=2737628. Maximum sequence length: 2049, sample length: 2565 [default0]:Skipping sample id=2712644. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2745454. Maximum sequence length: 2049, sample length: 3801 [default0]:Skipping sample id=2754670. Maximum sequence length: 2049, sample length: 2802 [default0]:Skipping sample id=2732191. Maximum sequence length: 2049, sample length: 2632 [default0]:Skipping sample id=2739706. Maximum sequence length: 2049, sample length: 3775 [default0]:Skipping sample id=2490565. Maximum sequence length: 2049, sample length: 2173 [default0]:Skipping sample id=2491827. Maximum sequence length: 2049, sample length: 2660 [default0]:Skipping sample id=2723731. Maximum sequence length: 2049, sample length: 3344 [default0]:Skipping sample id=2743286. Maximum sequence length: 2049, sample length: 4271 [default0]:Skipping sample id=2714890. Maximum sequence length: 2049, sample length: 3093 [default0]:Skipping sample id=2730811. Maximum sequence length: 2049, sample length: 2508 [default0]:Skipping sample id=2752713. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2756748. Maximum sequence length: 2049, sample length: 3272 [default0]:Skipping sample id=2713330. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2711966. Maximum sequence length: 2049, sample length: 2465 [default0]:Skipping sample id=2498802. Maximum sequence length: 2049, sample length: 2160 [default0]:Skipping sample id=2479180. Maximum sequence length: 2049, sample length: 3012 [default0]:Skipping sample id=2729487. Maximum sequence length: 2049, sample length: 3877 [default0]:Skipping sample id=2725013. Maximum sequence length: 2049, sample length: 2921 [default0]:Skipping sample id=2723829. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2738372. Maximum sequence length: 2049, sample length: 2680 [default0]:Skipping sample id=2718572. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2750938. Maximum sequence length: 2049, sample length: 3125 [default0]:Skipping sample id=2718280. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2494714. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2715659. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2734116. Maximum sequence length: 2049, sample length: 4751 [default0]:Skipping sample id=2732081. Maximum sequence length: 2049, sample length: 2077 [default0]:Skipping sample id=2732394. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2746045. Maximum sequence length: 2049, sample length: 2243 [default0]:Skipping sample id=2720038. Maximum sequence length: 2049, sample length: 3362 [default0]:Skipping sample id=2747282. Maximum sequence length: 2049, sample length: 2335 [default0]:Skipping sample id=2713098. Maximum sequence length: 2049, sample length: 3054 [default0]:Skipping sample id=2729518. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2735943. Maximum sequence length: 2049, sample length: 4438 [default0]:Skipping sample id=2747126. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2743369. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2752973. Maximum sequence length: 2049, sample length: 5139 [default0]:Skipping sample id=2717314. Maximum sequence length: 2049, sample length: 3390 [default0]:Skipping sample id=2479440. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2727179. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2733428. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2716868. Maximum sequence length: 2049, sample length: 3430 [default0]:Skipping sample id=2487778. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2735661. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2714784. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2737558. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2477463. Maximum sequence length: 2049, sample length: 3174 [default0]:Skipping sample id=2731165. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2756875. Maximum sequence length: 2049, sample length: 2568 [default0]:Skipping sample id=2494627. Maximum sequence length: 2049, sample length: 2650 [default0]:Skipping sample id=2753920. Maximum sequence length: 2049, sample length: 2174 [default0]:Skipping sample id=2754108. Maximum sequence length: 2049, sample length: 4814 [default0]:Skipping sample id=2716704. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2748191. Maximum sequence length: 2049, sample length: 2830 [default0]:Skipping sample id=2729378. Maximum sequence length: 2049, sample length: 3422 [default0]:Skipping sample id=2485138. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2751809. Maximum sequence length: 2049, sample length: 3384 [default0]:Skipping sample id=2730202. Maximum sequence length: 2049, sample length: 3026 [default0]:Skipping sample id=2731134. Maximum sequence length: 2049, sample length: 3109 [default0]:Skipping sample id=2719783. Maximum sequence length: 2049, sample length: 2694 [default0]:Skipping sample id=2753777. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2734440. Maximum sequence length: 2049, sample length: 3495 [default0]:Skipping sample id=2729131. Maximum sequence length: 2049, sample length: 3315 [default0]:Skipping sample id=2714124. Maximum sequence length: 2049, sample length: 2814 [default0]:Skipping sample id=2713158. Maximum sequence length: 2049, sample length: 2985 [default0]:Skipping sample id=2740602. Maximum sequence length: 2049, sample length: 3703 [default0]:Skipping sample id=2482450. Maximum sequence length: 2049, sample length: 2132 [default0]:Skipping sample id=2466282. Maximum sequence length: 2049, sample length: 3034 [default0]:Skipping sample id=2722692. Maximum sequence length: 2049, sample length: 3297 [default0]:Skipping sample id=2726107. Maximum sequence length: 2049, sample length: 2894 [default0]:Skipping sample id=2717605. Maximum sequence length: 2049, sample length: 5086 [default0]:Skipping sample id=2721377. Maximum sequence length: 2049, sample length: 2517 [default0]:Skipping sample id=2712970. Maximum sequence length: 2049, sample length: 4938 [default0]:Skipping sample id=2719581. Maximum sequence length: 2049, sample length: 6638 [default0]:Skipping sample id=2721598. Maximum sequence length: 2049, sample length: 6668 [default0]:Skipping sample id=2748415. Maximum sequence length: 2049, sample length: 4279 [default0]:Skipping sample id=2739667. Maximum sequence length: 2049, sample length: 3453 [default0]:Skipping sample id=2757070. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2718937. Maximum sequence length: 2049, sample length: 2312 [default0]:Skipping sample id=2738639. Maximum sequence length: 2049, sample length: 4586 [default0]:Skipping sample id=2749524. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2722907. Maximum sequence length: 2049, sample length: 2804 [default0]:Skipping sample id=2734056. Maximum sequence length: 2049, sample length: 2982 [default0]:Skipping sample id=2752503. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2755157. Maximum sequence length: 2049, sample length: 4526 [default0]:Skipping sample id=2734934. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2716528. Maximum sequence length: 2049, sample length: 3286 [default0]:Skipping sample id=2727341. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2734390. Maximum sequence length: 2049, sample length: 4635 [default0]:Skipping sample id=2755346. Maximum sequence length: 2049, sample length: 2646 [default0]:Skipping sample id=2710964. Maximum sequence length: 2049, sample length: 2321 [default0]:Skipping sample id=2496432. Maximum sequence length: 2049, sample length: 3014 [default0]:Skipping sample id=2721419. Maximum sequence length: 2049, sample length: 2408 [default0]:Skipping sample id=2713475. Maximum sequence length: 2049, sample length: 2097 [default0]:Skipping sample id=2746859. Maximum sequence length: 2049, sample length: 3100 [default0]:Skipping sample id=2478181. Maximum sequence length: 2049, sample length: 3387 [default0]:Skipping sample id=2755446. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2741243. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2737719. Maximum sequence length: 2049, sample length: 3607 [default0]:Skipping sample id=2712842. Maximum sequence length: 2049, sample length: 4910 [default0]:Skipping sample id=2730040. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2745461. Maximum sequence length: 2049, sample length: 3116 [default0]:Skipping sample id=2752796. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2716230. Maximum sequence length: 2049, sample length: 4862 [default0]:Skipping sample id=2734187. Maximum sequence length: 2049, sample length: 2269 [default0]:Skipping sample id=2492086. Maximum sequence length: 2049, sample length: 2109 [default0]:Skipping sample id=2752746. Maximum sequence length: 2049, sample length: 2348 [default0]:Skipping sample id=2740685. Maximum sequence length: 2049, sample length: 3699 [default0]:Skipping sample id=2724882. Maximum sequence length: 2049, sample length: 4331 [default0]:Skipping sample id=2738081. Maximum sequence length: 2049, sample length: 4319 [default0]:Skipping sample id=2729351. Maximum sequence length: 2049, sample length: 2579 [default0]:Skipping sample id=2753522. Maximum sequence length: 2049, sample length: 3357 [default0]:Skipping sample id=2734708. Maximum sequence length: 2049, sample length: 2552 [default0]:Skipping sample id=2485595. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2465791. Maximum sequence length: 2049, sample length: 2095 [default0]:Skipping sample id=2493158. Maximum sequence length: 2049, sample length: 2192 [default0]:Skipping sample id=2714649. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2714438. Maximum sequence length: 2049, sample length: 3802 [default0]:Skipping sample id=2713959. Maximum sequence length: 2049, sample length: 3221 [default0]:Skipping sample id=2465889. Maximum sequence length: 2049, sample length: 2418 [default0]:Skipping sample id=2754547. Maximum sequence length: 2049, sample length: 5108 [default0]:Skipping sample id=2744055. Maximum sequence length: 2049, sample length: 3739 [default0]:Skipping sample id=2741852. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2730174. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2490429. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2711999. Maximum sequence length: 2049, sample length: 3571 [default0]:Skipping sample id=2743574. Maximum sequence length: 2049, sample length: 3085 [default0]:Skipping sample id=2735052. Maximum sequence length: 2049, sample length: 3544 [default0]:Skipping sample id=2732697. Maximum sequence length: 2049, sample length: 3204 [default0]:Skipping sample id=2726605. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2738693. Maximum sequence length: 2049, sample length: 3025 [default0]:Skipping sample id=2743048. Maximum sequence length: 2049, sample length: 2416 [default0]:Skipping sample id=2714949. Maximum sequence length: 2049, sample length: 3121 [default0]:Skipping sample id=2498380. Maximum sequence length: 2049, sample length: 2850 [default0]:Skipping sample id=2730011. Maximum sequence length: 2049, sample length: 4314 [default0]:Skipping sample id=2715078. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2711844. Maximum sequence length: 2049, sample length: 2629 [default0]:Skipping sample id=2487699. Maximum sequence length: 2049, sample length: 2137 [default0]:Skipping sample id=2728132. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2727025. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2729236. Maximum sequence length: 2049, sample length: 2723 [default0]:Skipping sample id=2754604. Maximum sequence length: 2049, sample length: 3319 [default0]:Skipping sample id=2743820. Maximum sequence length: 2049, sample length: 3946 [default0]:Skipping sample id=2477844. Maximum sequence length: 2049, sample length: 3193 [default0]:Skipping sample id=2496899. Maximum sequence length: 2049, sample length: 2081 [default0]:Skipping sample id=2713951. Maximum sequence length: 2049, sample length: 4589 [default0]:Skipping sample id=2727101. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2733741. Maximum sequence length: 2049, sample length: 4948 [default0]:Skipping sample id=2741283. Maximum sequence length: 2049, sample length: 2539 [default0]:Skipping sample id=2746602. Maximum sequence length: 2049, sample length: 2314 [default0]:Skipping sample id=2752610. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2755181. Maximum sequence length: 2049, sample length: 3707 [default0]:Skipping sample id=2736509. Maximum sequence length: 2049, sample length: 4278 [default0]:Skipping sample id=2712212. Maximum sequence length: 2049, sample length: 2878 [default0]:Skipping sample id=2719927. Maximum sequence length: 2049, sample length: 5235 [default0]:Skipping sample id=2724047. Maximum sequence length: 2049, sample length: 2394 [default0]:Skipping sample id=2713061. Maximum sequence length: 2049, sample length: 2275 [default0]:Skipping sample id=2743712. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2718858. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2747405. Maximum sequence length: 2049, sample length: 2442 [default0]:Skipping sample id=2747099. Maximum sequence length: 2049, sample length: 2570 [default0]:Skipping sample id=2735152. Maximum sequence length: 2049, sample length: 3801 [default0]:Skipping sample id=2711493. Maximum sequence length: 2049, sample length: 4278 [default0]:Skipping sample id=2729074. Maximum sequence length: 2049, sample length: 3002 [default0]:Skipping sample id=2493436. Maximum sequence length: 2049, sample length: 2092 [default0]:Skipping sample id=2736232. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2735719. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2737711. Maximum sequence length: 2049, sample length: 6256 [default0]:Skipping sample id=2478965. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2715993. Maximum sequence length: 2049, sample length: 4288 [default0]:Skipping sample id=2743061. Maximum sequence length: 2049, sample length: 2412 [default0]:Skipping sample id=2737087. Maximum sequence length: 2049, sample length: 2935 [default0]:Skipping sample id=2733750. Maximum sequence length: 2049, sample length: 4855 [default0]:Skipping sample id=2718666. Maximum sequence length: 2049, sample length: 2362 [default0]:Skipping sample id=2729515. Maximum sequence length: 2049, sample length: 2121 [default0]:Skipping sample id=2748781. Maximum sequence length: 2049, sample length: 3031 [default0]:Skipping sample id=2740582. Maximum sequence length: 2049, sample length: 3463 [default0]:Skipping sample id=2732689. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2718374. Maximum sequence length: 2049, sample length: 3534 [default0]:Skipping sample id=2722310. Maximum sequence length: 2049, sample length: 2634 [default0]:Skipping sample id=2497858. Maximum sequence length: 2049, sample length: 2155 [default0]:Skipping sample id=2723728. Maximum sequence length: 2049, sample length: 6158 [default0]:Skipping sample id=2744620. Maximum sequence length: 2049, sample length: 2938 [default0]:Skipping sample id=2756640. Maximum sequence length: 2049, sample length: 3363 [default0]:Skipping sample id=2720398. Maximum sequence length: 2049, sample length: 3844 [default0]:Skipping sample id=2754726. Maximum sequence length: 2049, sample length: 2915 [default0]:Skipping sample id=2715111. Maximum sequence length: 2049, sample length: 2190 [default0]:Skipping sample id=2490751. Maximum sequence length: 2049, sample length: 3540 [default0]:Skipping sample id=2742972. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2724955. Maximum sequence length: 2049, sample length: 4577 [default0]:Skipping sample id=2750768. Maximum sequence length: 2049, sample length: 3688 [default0]:Skipping sample id=2739500. Maximum sequence length: 2049, sample length: 2818 [default0]:Skipping sample id=2725315. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2747395. Maximum sequence length: 2049, sample length: 3966 [default0]:Skipping sample id=2725516. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2720432. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2719502. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2479674. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2486816. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2711233. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2729136. Maximum sequence length: 2049, sample length: 3581 [default0]:Skipping sample id=2716110. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2712714. Maximum sequence length: 2049, sample length: 2306 [default0]:Skipping sample id=2755250. Maximum sequence length: 2049, sample length: 2628 [default0]:Skipping sample id=2723248. Maximum sequence length: 2049, sample length: 3045 [default0]:Skipping sample id=2726857. Maximum sequence length: 2049, sample length: 4228 [default0]:Skipping sample id=2733799. Maximum sequence length: 2049, sample length: 2380 [default0]:Skipping sample id=2726148. Maximum sequence length: 2049, sample length: 3624 [default0]:Skipping sample id=2719031. Maximum sequence length: 2049, sample length: 2936 [default0]:Skipping sample id=2737687. Maximum sequence length: 2049, sample length: 3689 [default0]:Skipping sample id=2729992. Maximum sequence length: 2049, sample length: 3074 [default0]:Skipping sample id=2751500. Maximum sequence length: 2049, sample length: 5167 [default0]:Skipping sample id=2714671. Maximum sequence length: 2049, sample length: 2162 [default0]:Skipping sample id=2714011. Maximum sequence length: 2049, sample length: 4370 [default0]:Skipping sample id=2735409. Maximum sequence length: 2049, sample length: 6616 [default0]:Skipping sample id=2711416. Maximum sequence length: 2049, sample length: 3785 [default0]:Skipping sample id=2498883. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2488202. Maximum sequence length: 2049, sample length: 3396 [default0]:Skipping sample id=2717606. Maximum sequence length: 2049, sample length: 2543 [default0]:Skipping sample id=2711558. Maximum sequence length: 2049, sample length: 4718 [default0]:Skipping sample id=2718655. Maximum sequence length: 2049, sample length: 2711 [default0]:Skipping sample id=2755305. Maximum sequence length: 2049, sample length: 3039 [default0]:Skipping sample id=2731749. Maximum sequence length: 2049, sample length: 2740 [default0]:Skipping sample id=2722677. Maximum sequence length: 2049, sample length: 3650 [default0]:Skipping sample id=2750926. Maximum sequence length: 2049, sample length: 2493 [default0]:Skipping sample id=2720090. Maximum sequence length: 2049, sample length: 4832 [default0]:Skipping sample id=2724487. Maximum sequence length: 2049, sample length: 4020 [default0]:Skipping sample id=2716053. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2750132. Maximum sequence length: 2049, sample length: 3966 [default0]:Skipping sample id=2719998. Maximum sequence length: 2049, sample length: 3675 [default0]:Skipping sample id=2487242. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2484712. Maximum sequence length: 2049, sample length: 2309 [default0]:Skipping sample id=2727149. Maximum sequence length: 2049, sample length: 2598 [default0]:Skipping sample id=2718752. Maximum sequence length: 2049, sample length: 2735 [default0]:Skipping sample id=2724997. Maximum sequence length: 2049, sample length: 2140 [default0]:Skipping sample id=2720408. Maximum sequence length: 2049, sample length: 2199 [default0]:Skipping sample id=2716992. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2738834. Maximum sequence length: 2049, sample length: 6211 [default0]:Skipping sample id=2755742. Maximum sequence length: 2049, sample length: 5318 [default0]:Skipping sample id=2478180. Maximum sequence length: 2049, sample length: 2313 [default0]:Skipping sample id=2731856. Maximum sequence length: 2049, sample length: 6650 [default0]:Skipping sample id=2498210. Maximum sequence length: 2049, sample length: 2159 [default0]:Skipping sample id=2753601. Maximum sequence length: 2049, sample length: 3377 [default0]:Skipping sample id=2728853. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2726202. Maximum sequence length: 2049, sample length: 2079 [default0]:Skipping sample id=2725189. Maximum sequence length: 2049, sample length: 3443 [default0]:Skipping sample id=2718955. Maximum sequence length: 2049, sample length: 2438 [default0]:Skipping sample id=2735870. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2740988. Maximum sequence length: 2049, sample length: 3240 [default0]:Skipping sample id=2754264. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2756921. Maximum sequence length: 2049, sample length: 3753 [default0]:Skipping sample id=2735200. Maximum sequence length: 2049, sample length: 2054 [default0]:Skipping sample id=2713459. Maximum sequence length: 2049, sample length: 2799 [default0]:Skipping sample id=2733447. Maximum sequence length: 2049, sample length: 2782 [default0]:Skipping sample id=2755496. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2725825. Maximum sequence length: 2049, sample length: 3053 [default0]:Skipping sample id=2717130. Maximum sequence length: 2049, sample length: 2726 [default0]:Skipping sample id=2744762. Maximum sequence length: 2049, sample length: 2588 [default0]:Skipping sample id=2720371. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2729605. Maximum sequence length: 2049, sample length: 2775 [default0]:Skipping sample id=2724796. Maximum sequence length: 2049, sample length: 4782 [default0]:Skipping sample id=2477151. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2741858. Maximum sequence length: 2049, sample length: 3964 [default0]:Skipping sample id=2495915. Maximum sequence length: 2049, sample length: 2193 [default0]:Skipping sample id=2754840. Maximum sequence length: 2049, sample length: 2242 [default0]:Skipping sample id=2488359. Maximum sequence length: 2049, sample length: 2084 [default0]:Skipping sample id=2740250. Maximum sequence length: 2049, sample length: 2938 [default0]:Skipping sample id=2746818. Maximum sequence length: 2049, sample length: 3338 [default0]:Skipping sample id=2748968. Maximum sequence length: 2049, sample length: 2806 [default0]:Skipping sample id=2746158. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2717136. Maximum sequence length: 2049, sample length: 2696 [default0]:Skipping sample id=2741679. Maximum sequence length: 2049, sample length: 2605 [default0]:Skipping sample id=2492918. Maximum sequence length: 2049, sample length: 2751 [default0]:Skipping sample id=2740846. Maximum sequence length: 2049, sample length: 3910 [default0]:Skipping sample id=2751459. Maximum sequence length: 2049, sample length: 3348 [default0]:Skipping sample id=2735222. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2723429. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2745067. Maximum sequence length: 2049, sample length: 2810 [default0]:Skipping sample id=2755458. Maximum sequence length: 2049, sample length: 3402 [default0]:Skipping sample id=2732905. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2729618. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2725592. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2736876. Maximum sequence length: 2049, sample length: 2594 [default0]:Skipping sample id=2727417. Maximum sequence length: 2049, sample length: 14247 [default0]:Skipping sample id=2480444. Maximum sequence length: 2049, sample length: 2154 [default0]:Skipping sample id=2725856. Maximum sequence length: 2049, sample length: 2625 [default0]:Skipping sample id=2719838. Maximum sequence length: 2049, sample length: 2148 [default0]:Skipping sample id=2741686. Maximum sequence length: 2049, sample length: 2343 [default0]:Skipping sample id=2737111. Maximum sequence length: 2049, sample length: 5522 [default0]:Skipping sample id=2721659. Maximum sequence length: 2049, sample length: 2123 [default0]:Skipping sample id=2754467. Maximum sequence length: 2049, sample length: 3131 [default0]:Skipping sample id=2727222. Maximum sequence length: 2049, sample length: 2856 [default0]:Skipping sample id=2731121. Maximum sequence length: 2049, sample length: 3951 [default0]:Skipping sample id=2725123. Maximum sequence length: 2049, sample length: 3887 [default0]:Skipping sample id=2752871. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2723961. Maximum sequence length: 2049, sample length: 2315 [default0]:Skipping sample id=2754193. Maximum sequence length: 2049, sample length: 3551 [default0]:Skipping sample id=2756427. Maximum sequence length: 2049, sample length: 5343 [default0]:Skipping sample id=2729510. Maximum sequence length: 2049, sample length: 2600 [default0]:Skipping sample id=2738042. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2729468. Maximum sequence length: 2049, sample length: 3249 [default0]:Skipping sample id=2739615. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2735732. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2487914. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2471004. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2723400. Maximum sequence length: 2049, sample length: 3397 [default0]:Skipping sample id=2726109. Maximum sequence length: 2049, sample length: 2175 [default0]:Skipping sample id=2723409. Maximum sequence length: 2049, sample length: 2516 [default0]:Skipping sample id=2738000. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2716075. Maximum sequence length: 2049, sample length: 3158 [default0]:Skipping sample id=2748873. Maximum sequence length: 2049, sample length: 4826 [default0]:Skipping sample id=2718802. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2727693. Maximum sequence length: 2049, sample length: 2413 [default0]:Skipping sample id=2734430. Maximum sequence length: 2049, sample length: 2567 [default0]:Skipping sample id=2739316. Maximum sequence length: 2049, sample length: 2181 [default0]:Skipping sample id=2739526. Maximum sequence length: 2049, sample length: 2482 [default0]:Skipping sample id=2725598. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2726873. Maximum sequence length: 2049, sample length: 5177 [default0]:Skipping sample id=2478243. Maximum sequence length: 2049, sample length: 2651 [default0]:Skipping sample id=2711450. Maximum sequence length: 2049, sample length: 3113 [default0]:Skipping sample id=2724889. Maximum sequence length: 2049, sample length: 4091 [default0]:Skipping sample id=2727671. Maximum sequence length: 2049, sample length: 2769 [default0]:Skipping sample id=2741361. Maximum sequence length: 2049, sample length: 3266 [default0]:Skipping sample id=2752197. Maximum sequence length: 2049, sample length: 5709 [default0]:Skipping sample id=2717773. Maximum sequence length: 2049, sample length: 2082 [default0]:Skipping sample id=2731498. Maximum sequence length: 2049, sample length: 3393 [default0]:Skipping sample id=2722279. Maximum sequence length: 2049, sample length: 3716 [default0]:Skipping sample id=2737692. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2488951. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2734185. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2748617. Maximum sequence length: 2049, sample length: 2207 [default0]:Skipping sample id=2748206. Maximum sequence length: 2049, sample length: 6768 [default0]:Skipping sample id=2497310. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2745374. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2726253. Maximum sequence length: 2049, sample length: 3280 [default0]:Skipping sample id=2725218. Maximum sequence length: 2049, sample length: 3641 [default0]:Skipping sample id=2487279. Maximum sequence length: 2049, sample length: 3035 [default0]:Skipping sample id=2723850. Maximum sequence length: 2049, sample length: 3265 [default0]:Skipping sample id=2494435. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2718535. Maximum sequence length: 2049, sample length: 2997 [default0]:Skipping sample id=2717960. Maximum sequence length: 2049, sample length: 2051 [default0]:Skipping sample id=2739332. Maximum sequence length: 2049, sample length: 2811 [default0]:Skipping sample id=2740682. Maximum sequence length: 2049, sample length: 2857 [default0]:Skipping sample id=2731013. Maximum sequence length: 2049, sample length: 3609 [default0]:Skipping sample id=2755011. Maximum sequence length: 2049, sample length: 2522 [default0]:Skipping sample id=2730790. Maximum sequence length: 2049, sample length: 2239 [default0]:Skipping sample id=2719755. Maximum sequence length: 2049, sample length: 3188 [default0]:Skipping sample id=2723143. Maximum sequence length: 2049, sample length: 2624 [default0]:Skipping sample id=2719427. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2730036. Maximum sequence length: 2049, sample length: 2468 [default0]:Skipping sample id=2723063. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2727754. Maximum sequence length: 2049, sample length: 3138 [default0]:Skipping sample id=2715069. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2749487. Maximum sequence length: 2049, sample length: 3342 [default0]:Skipping sample id=2755674. Maximum sequence length: 2049, sample length: 3630 [default0]:Skipping sample id=2753606. Maximum sequence length: 2049, sample length: 2643 [default0]:Skipping sample id=2740630. Maximum sequence length: 2049, sample length: 3272 [default0]:Skipping sample id=2748880. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2723373. Maximum sequence length: 2049, sample length: 3767 [default0]:Skipping sample id=2733729. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2739481. Maximum sequence length: 2049, sample length: 2272 [default0]:Skipping sample id=2716508. Maximum sequence length: 2049, sample length: 2437 [default0]:Skipping sample id=2722311. Maximum sequence length: 2049, sample length: 3231 [default0]:Skipping sample id=2740960. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2747144. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2489092. Maximum sequence length: 2049, sample length: 2125 [default0]:Skipping sample id=2470341. Maximum sequence length: 2049, sample length: 2357 [default0]:Skipping sample id=2720775. Maximum sequence length: 2049, sample length: 2232 [default0]:Skipping sample id=2493783. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2712083. Maximum sequence length: 2049, sample length: 2317 [default0]:Skipping sample id=2726889. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2493513. Maximum sequence length: 2049, sample length: 2344 [default0]:Skipping sample id=2751291. Maximum sequence length: 2049, sample length: 6405 [default0]:Skipping sample id=2754562. Maximum sequence length: 2049, sample length: 2837 [default0]:Skipping sample id=2744912. Maximum sequence length: 2049, sample length: 2580 [default0]:Skipping sample id=2734181. Maximum sequence length: 2049, sample length: 3625 [default0]:Skipping sample id=2718489. Maximum sequence length: 2049, sample length: 2472 [default0]:Skipping sample id=2752531. Maximum sequence length: 2049, sample length: 3203 [default0]:Skipping sample id=2736968. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2730741. Maximum sequence length: 2049, sample length: 2675 [default0]:Skipping sample id=2756703. Maximum sequence length: 2049, sample length: 3230 [default0]:Skipping sample id=2743925. Maximum sequence length: 2049, sample length: 2383 [default0]:Skipping sample id=2743460. Maximum sequence length: 2049, sample length: 2670 [default0]:Skipping sample id=2736698. Maximum sequence length: 2049, sample length: 3566 [default0]:Skipping sample id=2718057. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2732698. Maximum sequence length: 2049, sample length: 4080 [default0]:Skipping sample id=2725500. Maximum sequence length: 2049, sample length: 2291 [default0]:Skipping sample id=2723805. Maximum sequence length: 2049, sample length: 3369 [default0]:Skipping sample id=2749526. Maximum sequence length: 2049, sample length: 2535 [default0]:Skipping sample id=2724005. Maximum sequence length: 2049, sample length: 2197 [default0]:Skipping sample id=2749158. Maximum sequence length: 2049, sample length: 3069 [default0]:Skipping sample id=2490842. Maximum sequence length: 2049, sample length: 2625 [default0]:Skipping sample id=2712938. Maximum sequence length: 2049, sample length: 2326 [default0]:Skipping sample id=2722308. Maximum sequence length: 2049, sample length: 2933 [default0]:Skipping sample id=2729637. Maximum sequence length: 2049, sample length: 2563 [default0]:Skipping sample id=2482967. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2719554. Maximum sequence length: 2049, sample length: 2976 [default0]:Skipping sample id=2735005. Maximum sequence length: 2049, sample length: 7607 [default0]:Skipping sample id=2748951. Maximum sequence length: 2049, sample length: 3034 [default0]:Skipping sample id=2491059. Maximum sequence length: 2049, sample length: 2300 [default0]:Skipping sample id=2751344. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2715543. Maximum sequence length: 2049, sample length: 2384 [default0]:Skipping sample id=2747937. Maximum sequence length: 2049, sample length: 7074 [default0]:Skipping sample id=2731024. Maximum sequence length: 2049, sample length: 3163 [default0]:Skipping sample id=2724964. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2747185. Maximum sequence length: 2049, sample length: 2574 [default0]:Skipping sample id=2717417. Maximum sequence length: 2049, sample length: 3034 [default0]:Skipping sample id=2755906. Maximum sequence length: 2049, sample length: 2464 [default0]:Skipping sample id=2734715. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2485446. Maximum sequence length: 2049, sample length: 2763 [default0]:Skipping sample id=2744633. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2735846. Maximum sequence length: 2049, sample length: 2575 [default0]:Skipping sample id=2745194. Maximum sequence length: 2049, sample length: 2823 [default0]:Skipping sample id=2715427. Maximum sequence length: 2049, sample length: 4448 [default0]:Skipping sample id=2466673. Maximum sequence length: 2049, sample length: 2422 [default0]:Skipping sample id=2712878. Maximum sequence length: 2049, sample length: 3103 [default0]:Skipping sample id=2734735. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2726757. Maximum sequence length: 2049, sample length: 3193 [default0]:Skipping sample id=2486895. Maximum sequence length: 2049, sample length: 2349 [default0]:Skipping sample id=2730590. Maximum sequence length: 2049, sample length: 2392 [default0]:Skipping sample id=2739770. Maximum sequence length: 2049, sample length: 4520 [default0]:Skipping sample id=2724708. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2749701. Maximum sequence length: 2049, sample length: 2997 [default0]:Skipping sample id=2713864. Maximum sequence length: 2049, sample length: 2424 [default0]:Skipping sample id=2730650. Maximum sequence length: 2049, sample length: 3629 [default0]:Skipping sample id=2750493. Maximum sequence length: 2049, sample length: 3366 [default0]:Skipping sample id=2728937. Maximum sequence length: 2049, sample length: 2447 [default0]:Skipping sample id=2493665. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2749652. Maximum sequence length: 2049, sample length: 3388 [default0]:Skipping sample id=2750155. Maximum sequence length: 2049, sample length: 4433 [default0]:Skipping sample id=2493321. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2747029. Maximum sequence length: 2049, sample length: 5843 [default0]:Skipping sample id=2734758. Maximum sequence length: 2049, sample length: 2959 [default0]:Skipping sample id=2466987. Maximum sequence length: 2049, sample length: 2481 [default0]:Skipping sample id=2748308. Maximum sequence length: 2049, sample length: 2302 [default0]:Skipping sample id=2727120. Maximum sequence length: 2049, sample length: 4500 [default0]:Skipping sample id=2712325. Maximum sequence length: 2049, sample length: 3801 [default0]:Skipping sample id=2726590. Maximum sequence length: 2049, sample length: 3499 [default0]:Skipping sample id=2739982. Maximum sequence length: 2049, sample length: 5406 [default0]:Skipping sample id=2742757. Maximum sequence length: 2049, sample length: 2565 [default0]:Skipping sample id=2720135. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2741698. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2751246. Maximum sequence length: 2049, sample length: 3248 [default0]:Skipping sample id=2749585. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2720016. Maximum sequence length: 2049, sample length: 2897 [default0]:Skipping sample id=2745480. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2734324. Maximum sequence length: 2049, sample length: 2301 [default0]:Skipping sample id=2748428. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2499165. Maximum sequence length: 2049, sample length: 3093 [default0]:Skipping sample id=2499092. Maximum sequence length: 2049, sample length: 2218 [default0]:Skipping sample id=2741729. Maximum sequence length: 2049, sample length: 2875 [default0]:Skipping sample id=2721966. Maximum sequence length: 2049, sample length: 2546 [default0]:Skipping sample id=2751327. Maximum sequence length: 2049, sample length: 2473 [default0]:Skipping sample id=2751832. Maximum sequence length: 2049, sample length: 4871 [default0]:Skipping sample id=2494203. Maximum sequence length: 2049, sample length: 2255 [default0]:Skipping sample id=2466022. Maximum sequence length: 2049, sample length: 2318 [default0]:Skipping sample id=2720347. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2745739. Maximum sequence length: 2049, sample length: 2839 [default0]:Skipping sample id=2740365. Maximum sequence length: 2049, sample length: 2807 [default0]:Skipping sample id=2715674. Maximum sequence length: 2049, sample length: 2241 [default0]:Skipping sample id=2713498. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2726681. Maximum sequence length: 2049, sample length: 2586 [default0]:Skipping sample id=2755111. Maximum sequence length: 2049, sample length: 3412 [default0]:Skipping sample id=2499240. Maximum sequence length: 2049, sample length: 2780 [default0]:Skipping sample id=2716000. Maximum sequence length: 2049, sample length: 3986 [default0]:Skipping sample id=2740070. Maximum sequence length: 2049, sample length: 2761 [default0]:Skipping sample id=2723570. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2479416. Maximum sequence length: 2049, sample length: 2737 [default0]:Skipping sample id=2726237. Maximum sequence length: 2049, sample length: 2620 [default0]:Skipping sample id=2465739. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2753030. Maximum sequence length: 2049, sample length: 2536 [default0]:Skipping sample id=2745949. Maximum sequence length: 2049, sample length: 2866 [default0]:Skipping sample id=2489074. Maximum sequence length: 2049, sample length: 2240 [default0]:Skipping sample id=2751446. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2731756. Maximum sequence length: 2049, sample length: 2955 [default0]:Skipping sample id=2713732. Maximum sequence length: 2049, sample length: 2461 [default0]:Skipping sample id=2730530. Maximum sequence length: 2049, sample length: 2104 [default0]:Skipping sample id=2749943. Maximum sequence length: 2049, sample length: 3606 [default0]:Skipping sample id=2713342. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2728081. Maximum sequence length: 2049, sample length: 4242 [default0]:Skipping sample id=2716006. Maximum sequence length: 2049, sample length: 2431 [default0]:Skipping sample id=2712265. Maximum sequence length: 2049, sample length: 3582 [default0]:Skipping sample id=2724162. Maximum sequence length: 2049, sample length: 3048 [default0]:Skipping sample id=2477559. Maximum sequence length: 2049, sample length: 2198 [default0]:Skipping sample id=2753897. Maximum sequence length: 2049, sample length: 3000 [default0]:Skipping sample id=2731805. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2727219. Maximum sequence length: 2049, sample length: 5634 [default0]:Skipping sample id=2483685. Maximum sequence length: 2049, sample length: 2638 [default0]:Skipping sample id=2727093. Maximum sequence length: 2049, sample length: 3355 [default0]:Skipping sample id=2730431. Maximum sequence length: 2049, sample length: 2290 [default0]:Skipping sample id=2724757. Maximum sequence length: 2049, sample length: 2088 [default0]:Skipping sample id=2723319. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2725318. Maximum sequence length: 2049, sample length: 2548 [default0]:Skipping sample id=2737757. Maximum sequence length: 2049, sample length: 4899 [default0]:Skipping sample id=2727329. Maximum sequence length: 2049, sample length: 3176 [default0]:Skipping sample id=2741253. Maximum sequence length: 2049, sample length: 3357 [default0]:Skipping sample id=2733401. Maximum sequence length: 2049, sample length: 2607 [default0]:Skipping sample id=2733933. Maximum sequence length: 2049, sample length: 2893 [default0]:Skipping sample id=2728447. Maximum sequence length: 2049, sample length: 3822 [default0]:Skipping sample id=2731474. Maximum sequence length: 2049, sample length: 2381 [default0]:Skipping sample id=2733829. Maximum sequence length: 2049, sample length: 3722 [default0]:Skipping sample id=2493950. Maximum sequence length: 2049, sample length: 3592 [default0]:Skipping sample id=2737080. Maximum sequence length: 2049, sample length: 3673 [default0]:Skipping sample id=2721544. Maximum sequence length: 2049, sample length: 4521 [default0]:Skipping sample id=2747827. Maximum sequence length: 2049, sample length: 3190 [default0]:Skipping sample id=2745135. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2730757. Maximum sequence length: 2049, sample length: 2253 [default0]:Skipping sample id=2728506. Maximum sequence length: 2049, sample length: 2062 [default0]:Skipping sample id=2730318. Maximum sequence length: 2049, sample length: 3426 [default0]:Skipping sample id=2725313. Maximum sequence length: 2049, sample length: 2110 [default0]:Skipping sample id=2757073. Maximum sequence length: 2049, sample length: 2350 [default0]:Skipping sample id=2745797. Maximum sequence length: 2049, sample length: 2485 [default0]:Skipping sample id=2730739. Maximum sequence length: 2049, sample length: 2203 [default0]:Skipping sample id=2743402. Maximum sequence length: 2049, sample length: 5153 [default0]:Skipping sample id=2739057. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2713931. Maximum sequence length: 2049, sample length: 3518 [default0]:Skipping sample id=2468679. Maximum sequence length: 2049, sample length: 2945 [default0]:Skipping sample id=2750197. Maximum sequence length: 2049, sample length: 4247 [default0]:Skipping sample id=2730799. Maximum sequence length: 2049, sample length: 2280 [default0]:Skipping sample id=2744274. Maximum sequence length: 2049, sample length: 4124 [default0]:Skipping sample id=2723618. Maximum sequence length: 2049, sample length: 2479 [default0]:Skipping sample id=2730638. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2483152. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2729972. Maximum sequence length: 2049, sample length: 2989 [default0]:Skipping sample id=2750163. Maximum sequence length: 2049, sample length: 3415 [default0]:Skipping sample id=2736648. Maximum sequence length: 2049, sample length: 3550 [default0]:Skipping sample id=2729769. Maximum sequence length: 2049, sample length: 2719 [default0]:Skipping sample id=2726437. Maximum sequence length: 2049, sample length: 2867 [default0]:Skipping sample id=2716332. Maximum sequence length: 2049, sample length: 3509 [default0]:Skipping sample id=2718040. Maximum sequence length: 2049, sample length: 5821 [default0]:Skipping sample id=2717973. Maximum sequence length: 2049, sample length: 3008 [default0]:Skipping sample id=2741819. Maximum sequence length: 2049, sample length: 2329 [default0]:Skipping sample id=2711859. Maximum sequence length: 2049, sample length: 2997 [default0]:Skipping sample id=2734615. Maximum sequence length: 2049, sample length: 3020 [default0]:Skipping sample id=2491345. Maximum sequence length: 2049, sample length: 2180 [default0]:Skipping sample id=2713494. Maximum sequence length: 2049, sample length: 3394 [default0]:Skipping sample id=2470271. Maximum sequence length: 2049, sample length: 2852 [default0]:Skipping sample id=2730219. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2743488. Maximum sequence length: 2049, sample length: 2200 [default0]:Skipping sample id=2754195. Maximum sequence length: 2049, sample length: 2475 [default0]:Skipping sample id=2748451. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2738367. Maximum sequence length: 2049, sample length: 3239 [default0]:Skipping sample id=2745251. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2726324. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2747184. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2715365. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2743987. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2737213. Maximum sequence length: 2049, sample length: 3594 [default0]:Skipping sample id=2735162. Maximum sequence length: 2049, sample length: 4012 [default0]:Skipping sample id=2492625. Maximum sequence length: 2049, sample length: 2133 [default0]:Skipping sample id=2488629. Maximum sequence length: 2049, sample length: 2265 [default0]:Skipping sample id=2716946. Maximum sequence length: 2049, sample length: 3909 [default0]:Skipping sample id=2743437. Maximum sequence length: 2049, sample length: 3697 [default0]:Skipping sample id=2720875. Maximum sequence length: 2049, sample length: 3228 [default0]:Skipping sample id=2730333. Maximum sequence length: 2049, sample length: 2057 [default0]:Skipping sample id=2732589. Maximum sequence length: 2049, sample length: 3199 [default0]:Skipping sample id=2715350. Maximum sequence length: 2049, sample length: 4532 [default0]:Skipping sample id=2754796. Maximum sequence length: 2049, sample length: 5615 [default0]:Skipping sample id=2746653. Maximum sequence length: 2049, sample length: 2684 [default0]:Skipping sample id=2725072. Maximum sequence length: 2049, sample length: 2446 [default0]:Skipping sample id=2715639. Maximum sequence length: 2049, sample length: 3000 [default0]:Skipping sample id=2714263. Maximum sequence length: 2049, sample length: 6639 [default0]:Skipping sample id=2712198. Maximum sequence length: 2049, sample length: 2934 [default0]:Skipping sample id=2736431. Maximum sequence length: 2049, sample length: 4063 [default0]:Skipping sample id=2711384. Maximum sequence length: 2049, sample length: 2067 [default0]:Skipping sample id=2745270. Maximum sequence length: 2049, sample length: 3268 [default0]:Skipping sample id=2725801. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2715216. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2716410. Maximum sequence length: 2049, sample length: 2571 [default0]:Skipping sample id=2740184. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2491958. Maximum sequence length: 2049, sample length: 2308 [default0]:Skipping sample id=2716746. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2718518. Maximum sequence length: 2049, sample length: 3508 [default0]:Skipping sample id=2720250. Maximum sequence length: 2049, sample length: 2143 [default0]:Skipping sample id=2715017. Maximum sequence length: 2049, sample length: 2785 [default0]:Skipping sample id=2728457. Maximum sequence length: 2049, sample length: 3354 [default0]:Skipping sample id=2749104. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2742521. Maximum sequence length: 2049, sample length: 2664 [default0]:Skipping sample id=2746448. Maximum sequence length: 2049, sample length: 2102 [default0]:Skipping sample id=2755493. Maximum sequence length: 2049, sample length: 2403 [default0]:Skipping sample id=2747568. Maximum sequence length: 2049, sample length: 2771 [default0]:Skipping sample id=2736885. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2752864. Maximum sequence length: 2049, sample length: 5463 [default0]:Skipping sample id=2711387. Maximum sequence length: 2049, sample length: 2603 [default0]:Skipping sample id=2745249. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2726257. Maximum sequence length: 2049, sample length: 3650 [default0]:Skipping sample id=2717416. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2470641. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2735761. Maximum sequence length: 2049, sample length: 3798 [default0]:Skipping sample id=2735524. Maximum sequence length: 2049, sample length: 2061 [default0]:Skipping sample id=2734291. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2736600. Maximum sequence length: 2049, sample length: 2939 [default0]:Skipping sample id=2726162. Maximum sequence length: 2049, sample length: 2270 [default0]:Skipping sample id=2756644. Maximum sequence length: 2049, sample length: 2872 [default0]:Skipping sample id=2750537. Maximum sequence length: 2049, sample length: 2836 [default0]:Skipping sample id=2712512. Maximum sequence length: 2049, sample length: 4386 [default0]:Skipping sample id=2727771. Maximum sequence length: 2049, sample length: 3755 [default0]:Skipping sample id=2748392. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2748625. Maximum sequence length: 2049, sample length: 3248 [default0]:Skipping sample id=2746672. Maximum sequence length: 2049, sample length: 4416 [default0]:Skipping sample id=2734494. Maximum sequence length: 2049, sample length: 5413 [default0]:Skipping sample id=2746177. Maximum sequence length: 2049, sample length: 2183 [default0]:Skipping sample id=2755263. Maximum sequence length: 2049, sample length: 2364 [default0]:Skipping sample id=2719833. Maximum sequence length: 2049, sample length: 3956 [default0]:Skipping sample id=2494117. Maximum sequence length: 2049, sample length: 2119 [default0]:Skipping sample id=2495523. Maximum sequence length: 2049, sample length: 2212 [default0]:Skipping sample id=2747396. Maximum sequence length: 2049, sample length: 4133 [default0]:Skipping sample id=2733928. Maximum sequence length: 2049, sample length: 3016 [default0]:Skipping sample id=2741499. Maximum sequence length: 2049, sample length: 2779 [default0]:Skipping sample id=2743347. Maximum sequence length: 2049, sample length: 2323 [default0]:Skipping sample id=2722502. Maximum sequence length: 2049, sample length: 3005 [default0]:Skipping sample id=2716956. Maximum sequence length: 2049, sample length: 3485 [default0]:Skipping sample id=2734966. Maximum sequence length: 2049, sample length: 2223 [default0]:Skipping sample id=2466529. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2730748. Maximum sequence length: 2049, sample length: 2332 [default0]:Skipping sample id=2730804. Maximum sequence length: 2049, sample length: 4796 [default0]:Skipping sample id=2712287. Maximum sequence length: 2049, sample length: 2559 [default0]:Skipping sample id=2725129. Maximum sequence length: 2049, sample length: 2389 [default0]:Skipping sample id=2756617. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2745522. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2746736. Maximum sequence length: 2049, sample length: 3157 [default0]:Skipping sample id=2720724. Maximum sequence length: 2049, sample length: 2106 [default0]:Skipping sample id=2477176. Maximum sequence length: 2049, sample length: 2124 [default0]:Skipping sample id=2752580. Maximum sequence length: 2049, sample length: 2351 [default0]:Skipping sample id=2718997. Maximum sequence length: 2049, sample length: 3165 [default0]:Skipping sample id=2467354. Maximum sequence length: 2049, sample length: 2503 [default0]:Skipping sample id=2743629. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2739586. Maximum sequence length: 2049, sample length: 3522 [default0]:Skipping sample id=2715419. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2753434. Maximum sequence length: 2049, sample length: 2870 [default0]:Skipping sample id=2468686. Maximum sequence length: 2049, sample length: 2096 [default0]:Skipping sample id=2717176. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2721108. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2728531. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2753326. Maximum sequence length: 2049, sample length: 2058 [default0]:Skipping sample id=2490660. Maximum sequence length: 2049, sample length: 2733 [default0]:Skipping sample id=2732268. Maximum sequence length: 2049, sample length: 2436 [default0]:Skipping sample id=2722372. Maximum sequence length: 2049, sample length: 5473 [default0]:Skipping sample id=2478597. Maximum sequence length: 2049, sample length: 2499 [default0]:Skipping sample id=2724381. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2748747. Maximum sequence length: 2049, sample length: 2889 [default0]:Skipping sample id=2736717. Maximum sequence length: 2049, sample length: 2819 [default0]:Skipping sample id=2727823. Maximum sequence length: 2049, sample length: 4538 [default0]:Skipping sample id=2755113. Maximum sequence length: 2049, sample length: 3145 [default0]:Skipping sample id=2723680. Maximum sequence length: 2049, sample length: 2064 [default0]:Skipping sample id=2719734. Maximum sequence length: 2049, sample length: 3084 [default0]:Skipping sample id=2467583. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2742224. Maximum sequence length: 2049, sample length: 3019 [default0]:Skipping sample id=2714128. Maximum sequence length: 2049, sample length: 2131 [default0]:Skipping sample id=2720304. Maximum sequence length: 2049, sample length: 3060 [default0]:Skipping sample id=2713217. Maximum sequence length: 2049, sample length: 2374 [default0]:Skipping sample id=2739164. Maximum sequence length: 2049, sample length: 3923 [default0]:Skipping sample id=2746660. Maximum sequence length: 2049, sample length: 2572 [default0]:Skipping sample id=2488400. Maximum sequence length: 2049, sample length: 2616 [default0]:Skipping sample id=2739548. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2488568. Maximum sequence length: 2049, sample length: 2375 [default0]:Skipping sample id=2726592. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2721374. Maximum sequence length: 2049, sample length: 3994 [default0]:Skipping sample id=2727399. Maximum sequence length: 2049, sample length: 2256 [default0]:Skipping sample id=2734993. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2741181. Maximum sequence length: 2049, sample length: 2686 [default0]:Skipping sample id=2749066. Maximum sequence length: 2049, sample length: 2478 [default0]:Skipping sample id=2751436. Maximum sequence length: 2049, sample length: 4874 [default0]:Skipping sample id=2733160. Maximum sequence length: 2049, sample length: 2072 [default0]:Skipping sample id=2483357. Maximum sequence length: 2049, sample length: 2250 [default0]:Skipping sample id=2754632. Maximum sequence length: 2049, sample length: 4552 [default0]:Skipping sample id=2721585. Maximum sequence length: 2049, sample length: 2509 [default0]:Skipping sample id=2736709. Maximum sequence length: 2049, sample length: 2066 [default0]:Skipping sample id=2752527. Maximum sequence length: 2049, sample length: 3910 [default0]:Skipping sample id=2714855. Maximum sequence length: 2049, sample length: 2114 [default0]:Skipping sample id=2753122. Maximum sequence length: 2049, sample length: 2743 [default0]:Skipping sample id=2731895. Maximum sequence length: 2049, sample length: 2427 [default0]:Skipping sample id=2751387. Maximum sequence length: 2049, sample length: 5272 [default0]:Skipping sample id=2750718. Maximum sequence length: 2049, sample length: 2847 [default0]:Skipping sample id=2724959. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2730093. Maximum sequence length: 2049, sample length: 2419 [default0]:Skipping sample id=2744696. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2738214. Maximum sequence length: 2049, sample length: 3372 [default0]:Skipping sample id=2729709. Maximum sequence length: 2049, sample length: 3067 [default0]:Skipping sample id=2484589. Maximum sequence length: 2049, sample length: 2107 [default0]:Skipping sample id=2730679. Maximum sequence length: 2049, sample length: 2391 [default0]:Skipping sample id=2752468. Maximum sequence length: 2049, sample length: 2359 [default0]:Skipping sample id=2752080. Maximum sequence length: 2049, sample length: 2295 [default0]:Skipping sample id=2716169. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2724308. Maximum sequence length: 2049, sample length: 2342 [default0]:Skipping sample id=2741563. Maximum sequence length: 2049, sample length: 2360 [default0]:Skipping sample id=2714793. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2740320. Maximum sequence length: 2049, sample length: 2491 [default0]:Skipping sample id=2746119. Maximum sequence length: 2049, sample length: 2880 [default0]:Skipping sample id=2719559. Maximum sequence length: 2049, sample length: 2745 [default0]:Skipping sample id=2751179. Maximum sequence length: 2049, sample length: 2962 [default0]:Skipping sample id=2714565. Maximum sequence length: 2049, sample length: 2674 [default0]:Skipping sample id=2745268. Maximum sequence length: 2049, sample length: 4040 [default0]:Skipping sample id=2730058. Maximum sequence length: 2049, sample length: 2152 [default0]:Skipping sample id=2724863. Maximum sequence length: 2049, sample length: 3091 [default0]:Skipping sample id=2489578. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2493128. Maximum sequence length: 2049, sample length: 2444 [default0]:Skipping sample id=2714683. Maximum sequence length: 2049, sample length: 4209 [default0]:Skipping sample id=2737065. Maximum sequence length: 2049, sample length: 3308 [default0]:Skipping sample id=2730436. Maximum sequence length: 2049, sample length: 3825 [default0]:Skipping sample id=2737732. Maximum sequence length: 2049, sample length: 4216 [default0]:Skipping sample id=2750015. Maximum sequence length: 2049, sample length: 2120 [default0]:Skipping sample id=2728588. Maximum sequence length: 2049, sample length: 2397 [default0]:Skipping sample id=2756324. Maximum sequence length: 2049, sample length: 2327 [default0]:Skipping sample id=2715697. Maximum sequence length: 2049, sample length: 2722 [default0]:Skipping sample id=2734224. Maximum sequence length: 2049, sample length: 2320 [default0]:Skipping sample id=2729790. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2726151. Maximum sequence length: 2049, sample length: 2347 [default0]:Skipping sample id=2493308. Maximum sequence length: 2049, sample length: 2459 [default0]:Skipping sample id=2711022. Maximum sequence length: 2049, sample length: 2831 [default0]:Skipping sample id=2714195. Maximum sequence length: 2049, sample length: 2288 [default0]:Skipping sample id=2749292. Maximum sequence length: 2049, sample length: 3386 [default0]:Skipping sample id=2747823. Maximum sequence length: 2049, sample length: 2519 [default0]:Skipping sample id=2731031. Maximum sequence length: 2049, sample length: 3305 [default0]:Skipping sample id=2711557. Maximum sequence length: 2049, sample length: 4112 [default0]:Skipping sample id=2737837. Maximum sequence length: 2049, sample length: 2260 [default0]:Skipping sample id=2732245. Maximum sequence length: 2049, sample length: 3497 [default0]:Skipping sample id=2493310. Maximum sequence length: 2049, sample length: 2708 [default0]:Skipping sample id=2715594. Maximum sequence length: 2049, sample length: 2501 [default0]:Skipping sample id=2485341. Maximum sequence length: 2049, sample length: 2252 [default0]:Skipping sample id=2755919. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2726517. Maximum sequence length: 2049, sample length: 2926 [default0]:Skipping sample id=2712911. Maximum sequence length: 2049, sample length: 5821 [default0]:Skipping sample id=2736585. Maximum sequence length: 2049, sample length: 5047 [default0]:Skipping sample id=2749493. Maximum sequence length: 2049, sample length: 3763 [default0]:Skipping sample id=2746467. Maximum sequence length: 2049, sample length: 2236 [default0]:Skipping sample id=2484233. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2721784. Maximum sequence length: 2049, sample length: 4690 [default0]:Skipping sample id=2713668. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2729418. Maximum sequence length: 2049, sample length: 2647 [default0]:Skipping sample id=2491706. Maximum sequence length: 2049, sample length: 2453 [default0]:Skipping sample id=2712905. Maximum sequence length: 2049, sample length: 2658 [default0]:Skipping sample id=2751614. Maximum sequence length: 2049, sample length: 3944 [default0]:Skipping sample id=2752460. Maximum sequence length: 2049, sample length: 3099 [default0]:Skipping sample id=2466121. Maximum sequence length: 2049, sample length: 2698 [default0]:Skipping sample id=2749092. Maximum sequence length: 2049, sample length: 6218 [default0]:Skipping sample id=2731690. Maximum sequence length: 2049, sample length: 3617 [default0]:Skipping sample id=2714399. Maximum sequence length: 2049, sample length: 2702 [default0]:Skipping sample id=2721921. Maximum sequence length: 2049, sample length: 4091 [default0]:Skipping sample id=2726137. Maximum sequence length: 2049, sample length: 2752 [default0]:Skipping sample id=2747732. Maximum sequence length: 2049, sample length: 2946 [default0]:Skipping sample id=2747926. Maximum sequence length: 2049, sample length: 3084 [default0]:Skipping sample id=2470530. Maximum sequence length: 2049, sample length: 2144 [default0]:Skipping sample id=2715893. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2716384. Maximum sequence length: 2049, sample length: 5086 [default0]:Skipping sample id=2756419. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2712311. Maximum sequence length: 2049, sample length: 2979 [default0]:Skipping sample id=2729193. Maximum sequence length: 2049, sample length: 3160 [default0]:Skipping sample id=2733678. Maximum sequence length: 2049, sample length: 2666 [default0]:Skipping sample id=2745783. Maximum sequence length: 2049, sample length: 3076 [default0]:Skipping sample id=2729083. Maximum sequence length: 2049, sample length: 2146 [default0]:Skipping sample id=2754416. Maximum sequence length: 2049, sample length: 4407 [default0]:Skipping sample id=2746916. Maximum sequence length: 2049, sample length: 2693 [default0]:Skipping sample id=2752755. Maximum sequence length: 2049, sample length: 2184 [default0]:Skipping sample id=2496972. Maximum sequence length: 2049, sample length: 3238 [default0]:Skipping sample id=2748054. Maximum sequence length: 2049, sample length: 4959 [default0]:Skipping sample id=2726007. Maximum sequence length: 2049, sample length: 3403 [default0]:Skipping sample id=2718871. Maximum sequence length: 2049, sample length: 2129 [default0]:Skipping sample id=2745558. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2754174. Maximum sequence length: 2049, sample length: 3298 [default0]:Skipping sample id=2497328. Maximum sequence length: 2049, sample length: 2457 [default0]:Skipping sample id=2739140. Maximum sequence length: 2049, sample length: 3541 [default0]:Skipping sample id=2741289. Maximum sequence length: 2049, sample length: 3225 [default0]:Skipping sample id=2752193. Maximum sequence length: 2049, sample length: 2276 [default0]:Skipping sample id=2718061. Maximum sequence length: 2049, sample length: 2356 [default0]:Skipping sample id=2735487. Maximum sequence length: 2049, sample length: 2289 [default0]:Skipping sample id=2722456. Maximum sequence length: 2049, sample length: 2941 [default0]:Skipping sample id=2717500. Maximum sequence length: 2049, sample length: 2882 [default0]:Skipping sample id=2754355. Maximum sequence length: 2049, sample length: 6165 [default0]:Skipping sample id=2721416. Maximum sequence length: 2049, sample length: 4917 [default0]:Skipping sample id=2745092. Maximum sequence length: 2049, sample length: 2282 [default0]:Skipping sample id=2737405. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2738349. Maximum sequence length: 2049, sample length: 2186 [default0]:Skipping sample id=2484154. Maximum sequence length: 2049, sample length: 2790 [default0]:Skipping sample id=2745712. Maximum sequence length: 2049, sample length: 2671 [default0]:Skipping sample id=2723123. Maximum sequence length: 2049, sample length: 3241 [default0]:Skipping sample id=2747663. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2750484. Maximum sequence length: 2049, sample length: 2164 [default0]:Skipping sample id=2711636. Maximum sequence length: 2049, sample length: 3566 [default0]:Skipping sample id=2722599. Maximum sequence length: 2049, sample length: 2078 [default0]:Skipping sample id=2718355. Maximum sequence length: 2049, sample length: 2653 [default0]:Skipping sample id=2753927. Maximum sequence length: 2049, sample length: 2881 [default0]:Skipping sample id=2731670. Maximum sequence length: 2049, sample length: 2050 [default0]:Skipping sample id=2494131. Maximum sequence length: 2049, sample length: 2316 [default0]:Skipping sample id=2751757. Maximum sequence length: 2049, sample length: 2786 [default0]:Skipping sample id=2719416. Maximum sequence length: 2049, sample length: 2883 [default0]:Skipping sample id=2716411. Maximum sequence length: 2049, sample length: 2617 [default0]:Skipping sample id=2726727. Maximum sequence length: 2049, sample length: 6651 [default0]:Skipping sample id=2717093. Maximum sequence length: 2049, sample length: 4160 [default0]:Skipping sample id=2737194. Maximum sequence length: 2049, sample length: 2353 [default0]:Skipping sample id=2738807. Maximum sequence length: 2049, sample length: 2469 [default0]:Skipping sample id=2752031. Maximum sequence length: 2049, sample length: 4265 [default0]:Skipping sample id=2493466. Maximum sequence length: 2049, sample length: 2222 [default0]:Skipping sample id=2721349. Maximum sequence length: 2049, sample length: 3281 [default0]:Skipping sample id=2731480. Maximum sequence length: 2049, sample length: 2311 [default0]:Skipping sample id=2741277. Maximum sequence length: 2049, sample length: 2901 [default0]:Skipping sample id=2732409. Maximum sequence length: 2049, sample length: 2467 [default0]:Skipping sample id=2481748. Maximum sequence length: 2049, sample length: 2257 [default0]:Skipping sample id=2739736. Maximum sequence length: 2049, sample length: 5074 [default0]:Skipping sample id=2718029. Maximum sequence length: 2049, sample length: 2466 [default0]:Skipping sample id=2726234. Maximum sequence length: 2049, sample length: 2407 [default0]:Skipping sample id=2711050. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2728958. Maximum sequence length: 2049, sample length: 2127 [default0]:Skipping sample id=2492951. Maximum sequence length: 2049, sample length: 3109 [default0]:Skipping sample id=2711120. Maximum sequence length: 2049, sample length: 2712 [default0]:Skipping sample id=2746553. Maximum sequence length: 2049, sample length: 3111 [default0]:Skipping sample id=2755563. Maximum sequence length: 2049, sample length: 2053 [default0]:Skipping sample id=2491841. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2743792. Maximum sequence length: 2049, sample length: 2851 [default0]:Skipping sample id=2467696. Maximum sequence length: 2049, sample length: 2248 [default0]:Skipping sample id=2718452. Maximum sequence length: 2049, sample length: 6328 [default0]:Skipping sample id=2713348. Maximum sequence length: 2049, sample length: 2059 [default0]:Skipping sample id=2739144. Maximum sequence length: 2049, sample length: 2613 [default0]:Skipping sample id=2751681. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2728055. Maximum sequence length: 2049, sample length: 3705 [default0]:Skipping sample id=2719117. Maximum sequence length: 2049, sample length: 2764 [default0]:Skipping sample id=2714636. Maximum sequence length: 2049, sample length: 3811 [default0]:Skipping sample id=2480728. Maximum sequence length: 2049, sample length: 2273 [default0]:Skipping sample id=2742405. Maximum sequence length: 2049, sample length: 4129 [default0]:Skipping sample id=2466605. Maximum sequence length: 2049, sample length: 2130 [default0]:Skipping sample id=2718123. Maximum sequence length: 2049, sample length: 4719 [default0]:Skipping sample id=2483394. Maximum sequence length: 2049, sample length: 2221 [default0]:Skipping sample id=2721714. Maximum sequence length: 2049, sample length: 2150 [default0]:Skipping sample id=2747374. Maximum sequence length: 2049, sample length: 2652 [default0]:Skipping sample id=2730427. Maximum sequence length: 2049, sample length: 3824 [default0]:Skipping sample id=2470565. Maximum sequence length: 2049, sample length: 2358 [default0]:Skipping sample id=2713301. Maximum sequence length: 2049, sample length: 2934 [default0]:Skipping sample id=2756308. Maximum sequence length: 2049, sample length: 2960 [default0]:Skipping sample id=2753596. Maximum sequence length: 2049, sample length: 3017 [default0]:Skipping sample id=2723621. Maximum sequence length: 2049, sample length: 2622 [default0]:Skipping sample id=2495717. Maximum sequence length: 2049, sample length: 2188 [default0]:Skipping sample id=2736492. Maximum sequence length: 2049, sample length: 2390 [default0]:Skipping sample id=2483057. Maximum sequence length: 2049, sample length: 2100 [default0]:Skipping sample id=2742705. Maximum sequence length: 2049, sample length: 3645 [default0]:Skipping sample id=2716557. Maximum sequence length: 2049, sample length: 3289 [default0]:Skipping sample id=2731305. Maximum sequence length: 2049, sample length: 2476 [default0]:Skipping sample id=2755144. Maximum sequence length: 2049, sample length: 2118 [default0]:Skipping sample id=2718794. Maximum sequence length: 2049, sample length: 3636 [default0]:Skipping sample id=2756802. Maximum sequence length: 2049, sample length: 2542 [default0]:Skipping sample id=2743338. Maximum sequence length: 2049, sample length: 3011 [default0]:Skipping sample id=2721503. Maximum sequence length: 2049, sample length: 2902 [default0]:Skipping sample id=2467577. Maximum sequence length: 2049, sample length: 3516 [default0]:Skipping sample id=2734711. Maximum sequence length: 2049, sample length: 2278 [default0]:Skipping sample id=2723416. Maximum sequence length: 2049, sample length: 2992 [default0]:Skipping sample id=2746455. Maximum sequence length: 2049, sample length: 2411 [default0]:Skipping sample id=2470874. Maximum sequence length: 2049, sample length: 2297 [default0]:Skipping sample id=2753082. Maximum sequence length: 2049, sample length: 2840 [default0]:Skipping sample id=2738412. Maximum sequence length: 2049, sample length: 3959 [default0]:Skipping sample id=2719771. Maximum sequence length: 2049, sample length: 3856 [default0]:Skipping sample id=2715680. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2731315. Maximum sequence length: 2049, sample length: 4557 [default0]:Skipping sample id=2747481. Maximum sequence length: 2049, sample length: 2263 [default0]:Skipping sample id=2741039. Maximum sequence length: 2049, sample length: 2366 [default0]:Skipping sample id=2756986. Maximum sequence length: 2049, sample length: 2820 [default0]:Skipping sample id=2748881. Maximum sequence length: 2049, sample length: 3810 [default0]:Skipping sample id=2738659. Maximum sequence length: 2049, sample length: 3300 [default0]:Skipping sample id=2741186. Maximum sequence length: 2049, sample length: 3091 [default0]:Skipping sample id=2727162. Maximum sequence length: 2049, sample length: 2345 [default0]:Skipping sample id=2724510. Maximum sequence length: 2049, sample length: 4544 [default0]:Skipping sample id=2714870. Maximum sequence length: 2049, sample length: 2065 [default0]:Skipping sample id=2742988. Maximum sequence length: 2049, sample length: 3072 [default0]:Skipping sample id=2483382. Maximum sequence length: 2049, sample length: 2877 [default0]:Skipping sample id=2717159. Maximum sequence length: 2049, sample length: 14257 [default0]:Skipping sample id=2750794. Maximum sequence length: 2049, sample length: 4580 [default0]:Skipping sample id=2466220. Maximum sequence length: 2049, sample length: 3178 [default0]:Skipping sample id=2720692. Maximum sequence length: 2049, sample length: 2284 [default0]:Skipping sample id=2715149. Maximum sequence length: 2049, sample length: 2460 [default0]:Skipping sample id=2713260. Maximum sequence length: 2049, sample length: 3582 [default0]:Skipping sample id=2738356. Maximum sequence length: 2049, sample length: 2331 [default0]:Skipping sample id=2742366. Maximum sequence length: 2049, sample length: 2261 [default0]:Skipping sample id=2739963. Maximum sequence length: 2049, sample length: 2281 [default0]:Skipping sample id=2744887. Maximum sequence length: 2049, sample length: 2692 [default0]:Skipping sample id=2735903. Maximum sequence length: 2049, sample length: 3438 [default0]:Skipping sample id=2741619. Maximum sequence length: 2049, sample length: 2710 [default0]:Skipping sample id=2755916. Maximum sequence length: 2049, sample length: 2514 [default0]:Skipping sample id=2712184. Maximum sequence length: 2049, sample length: 2074 [default0]:Skipping sample id=2726465. Maximum sequence length: 2049, sample length: 2244 [default0]:Skipping sample id=2723024. Maximum sequence length: 2049, sample length: 2793 [default0]:Skipping sample id=2714982. Maximum sequence length: 2049, sample length: 2153 [default0]:Skipping sample id=2750728. Maximum sequence length: 2049, sample length: 2191 [default0]:Skipping sample id=2741894. Maximum sequence length: 2049, sample length: 2662 [default0]:Skipping sample id=2725527. Maximum sequence length: 2049, sample length: 3851 [default0]:Skipping sample id=2729701. Maximum sequence length: 2049, sample length: 2760 [default0]:Skipping sample id=2726427. Maximum sequence length: 2049, sample length: 2688 [default0]:Skipping sample id=2729201. Maximum sequence length: 2049, sample length: 2713 [default0]:Skipping sample id=2748818. Maximum sequence length: 2049, sample length: 2112 [default0]:Skipping sample id=2466993. Maximum sequence length: 2049, sample length: 2581 [default0]:Skipping sample id=2722646. Maximum sequence length: 2049, sample length: 4139 [default0]:Skipping sample id=2747246. Maximum sequence length: 2049, sample length: 3632 [default0]:Skipping sample id=2714694. Maximum sequence length: 2049, sample length: 3303 [default0]:Skipping sample id=2749771. Maximum sequence length: 2049, sample length: 2948 [default0]:Skipping sample id=2756444. Maximum sequence length: 2049, sample length: 2220 [default0]:Skipping sample id=2716653. Maximum sequence length: 2049, sample length: 2805 [default0]:Skipping sample id=2724006. Maximum sequence length: 2049, sample length: 2554 [default0]:Skipping sample id=2719220. Maximum sequence length: 2049, sample length: 5313 [default0]:Skipping sample id=2744454. Maximum sequence length: 2049, sample length: 2352 [default0]:Skipping sample id=2732741. Maximum sequence length: 2049, sample length: 2943 [default0]:Skipping sample id=2496282. Maximum sequence length: 2049, sample length: 2739 [default0]:Skipping sample id=2469485. Maximum sequence length: 2049, sample length: 2277 [default0]:Skipping sample id=2481936. Maximum sequence length: 2049, sample length: 2158 [default0]:Skipping sample id=2753915. Maximum sequence length: 2049, sample length: 2151 [default0]:Skipping sample id=2729681. Maximum sequence length: 2049, sample length: 4319 [default0]:Skipping sample id=2751821. Maximum sequence length: 2049, sample length: 2368 [default0]:Skipping sample id=2465933. Maximum sequence length: 2049, sample length: 2128 [default0]:Skipping sample id=2713704. Maximum sequence length: 2049, sample length: 2387 [default0]:Skipping sample id=2489913. Maximum sequence length: 2049, sample length: 2205 [default0]:Skipping sample id=2750677. Maximum sequence length: 2049, sample length: 3321 [default0]:Skipping sample id=2727850. Maximum sequence length: 2049, sample length: 2449 [default0]:Skipping sample id=2730771. Maximum sequence length: 2049, sample length: 2340 [default0]:Skipping sample id=2717956. Maximum sequence length: 2049, sample length: 2708 [default0]:Skipping sample id=2723853. Maximum sequence length: 2049, sample length: 2073 [default0]:Skipping sample id=2724903. Maximum sequence length: 2049, sample length: 2630 [default0]:Skipping sample id=2717405. Maximum sequence length: 2049, sample length: 2961 [default0]:Skipping sample id=2740941. Maximum sequence length: 2049, sample length: 2917 [default0]:Skipping sample id=2738384. Maximum sequence length: 2049, sample length: 6003 [default0]:Skipping sample id=2488144. Maximum sequence length: 2049, sample length: 2216 [default0]:Skipping sample id=2752690. Maximum sequence length: 2049, sample length: 2409 [default0]:Skipping sample id=2739662. Maximum sequence length: 2049, sample length: 3216 [default0]:Skipping sample id=2744686. Maximum sequence length: 2049, sample length: 2156 [default0]:Skipping sample id=2498714. Maximum sequence length: 2049, sample length: 3037 [default0]:Skipping sample id=2490272. Maximum sequence length: 2049, sample length: 2268 [default0]:Skipping sample id=2729737. Maximum sequence length: 2049, sample length: 2226 [default0]:Skipping sample id=2738669. Maximum sequence length: 2049, sample length: 2589 [default0]:Skipping sample id=2728668. Maximum sequence length: 2049, sample length: 3903 [default0]:Skipping sample id=2742455. Maximum sequence length: 2049, sample length: 2454 [default0]:Skipping sample id=2714845. Maximum sequence length: 2049, sample length: 3822 [default0]:Skipping sample id=2477324. Maximum sequence length: 2049, sample length: 2377 [default0]:Skipping sample id=2735090. Maximum sequence length: 2049, sample length: 2206 [default0]:Skipping sample id=2734583. Maximum sequence length: 2049, sample length: 2738 [default0]:Skipping sample id=2716207. Maximum sequence length: 2049, sample length: 2254 [default0]:Skipping sample id=2719158. Maximum sequence length: 2049, sample length: 3304 [default0]:Skipping sample id=2734351. Maximum sequence length: 2049, sample length: 2307 [default0]:Skipping sample id=2491749. Maximum sequence length: 2049, sample length: 2425 [default0]:Skipping sample id=2728192. Maximum sequence length: 2049, sample length: 2520 [default0]:Skipping sample id=2718576. Maximum sequence length: 2049, sample length: 2111 [default0]:Skipping sample id=2755063. Maximum sequence length: 2049, sample length: 2488 [default0]:Skipping sample id=2736493. Maximum sequence length: 2049, sample length: 2135 [default0]:Skipping sample id=2753183. Maximum sequence length: 2049, sample length: 4958 [default0]:Skipping sample id=2718979. Maximum sequence length: 2049, sample length: 3182 [default0]:Skipping sample id=2711813. Maximum sequence length: 2049, sample length: 7283 [default0]:Skipping sample id=2746981. Maximum sequence length: 2049, sample length: 2229 [default0]:Skipping sample id=2747143. Maximum sequence length: 2049, sample length: 2767 [default0]:Skipping sample id=2736582. Maximum sequence length: 2049, sample length: 2325 [default0]:Skipping sample id=2732719. Maximum sequence length: 2049, sample length: 5933 [default0]:Skipping sample id=2741470. Maximum sequence length: 2049, sample length: 2141 [default0]:Skipping sample id=2468152. Maximum sequence length: 2049, sample length: 2251 [default0]:Skipping sample id=2746590. Maximum sequence length: 2049, sample length: 2946 [default0]:Skipping sample id=2745226. Maximum sequence length: 2049, sample length: 6417 [default0]:Skipping sample id=2713564. Maximum sequence length: 2049, sample length: 4201 [default0]:Skipping sample id=2735264. Maximum sequence length: 2049, sample length: 2339 [default0]:Skipping sample id=2721864. Maximum sequence length: 2049, sample length: 3337 [default0]:Skipping sample id=2749817. Maximum sequence length: 2049, sample length: 2537 [default0]:Skipping sample id=2756944. Maximum sequence length: 2049, sample length: 2920 [default0]:Skipping sample id=2735505. Maximum sequence length: 2049, sample length: 3737 [default0]:Skipping sample id=2734296. Maximum sequence length: 2049, sample length: 2294 [default0]:Skipping sample id=2735988. Maximum sequence length: 2049, sample length: 4491 [default0]:Skipping sample id=2491518. Maximum sequence length: 2049, sample length: 2812 [default0]:Skipping sample id=2720246. Maximum sequence length: 2049, sample length: 2909 [default0]:Skipping sample id=2745815. Maximum sequence length: 2049, sample length: 5010 [default0]:Skipping sample id=2745683. Maximum sequence length: 2049, sample length: 3234 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: pretrain( [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]:Traceback (most recent call last): [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default4]:Traceback (most recent call last): [default1]: main() [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]:Traceback (most recent call last): [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: > elasped time to build and save shuffle-idx and sample-idx mapping (seconds): 7.597532 [default0]: > loading doc-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation_valid_indexmap_266240ns_42s_decoder_packed_batch_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation_valid_indexmap_266240ns_42s_decoder_packed_shuffle_idx.npy [default0]: loaded indexed file in 0.011 seconds [default0]:> finished creating T0 datasets ... [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default2]:Traceback (most recent call last): [default0]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default2]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: pretrain( [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default0]: main() [default2]: main() [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: batch_sampler = MegatronPretrainingSampler( [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: assert self.consumed_samples < self.total_samples, \ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: train_dataloader = build_pretraining_data_loader( [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: batch_sampler = MegatronPretrainingSampler( [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default6]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default6]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default7]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default6]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]: assert self.consumed_samples < self.total_samples, \ [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: batch_sampler = MegatronPretrainingSampler( [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default2]: main() [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:Traceback (most recent call last): [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default1]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: main() [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: batch_sampler = MegatronPretrainingSampler( [default4]: assert self.consumed_samples < self.total_samples, \ [default0]: return f(*args, **kwargs) [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:Traceback (most recent call last): [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: main() [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: assert self.consumed_samples < self.total_samples, \ [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: pretrain( [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: train_dataloader = build_pretraining_data_loader( [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]:Traceback (most recent call last): [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]:Traceback (most recent call last): [default3]: main() [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: assert self.consumed_samples < self.total_samples, \ [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default1]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]: batch_sampler = MegatronPretrainingSampler( [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:Traceback (most recent call last): [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: pretrain( [default3]: return f(*args, **kwargs) [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: train_dataloader = build_pretraining_data_loader( [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:Traceback (most recent call last): [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default1]: return f(*args, **kwargs) [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default4]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: pretrain( [default2]: return f(*args, **kwargs) [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default0]: return f(*args, **kwargs) [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default4]:Traceback (most recent call last): [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: train_dataloader = build_pretraining_data_loader( [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:Traceback (most recent call last): [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]:Traceback (most recent call last): [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default0]:Traceback (most recent call last): [default1]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: batch_sampler = MegatronPretrainingSampler( [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: train_dataloader = build_pretraining_data_loader( [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: assert self.consumed_samples < self.total_samples, \ [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: assert self.consumed_samples < self.total_samples, \ [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:Traceback (most recent call last): [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: return f(*args, **kwargs) [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:Traceback (most recent call last): [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: pretrain( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]:Traceback (most recent call last): [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default0]: return f(*args, **kwargs) [default3]: pretrain( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: main() [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: pretrain( [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: train_dataloader = build_pretraining_data_loader( [default4]: assert self.consumed_samples < self.total_samples, \ [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: pretrain( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: train_dataloader = build_pretraining_data_loader( [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default5]: pretrain( [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default2]:Traceback (most recent call last): [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]: pretrain( [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: return f(*args, **kwargs) [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default4]: train_dataloader = build_pretraining_data_loader( [default2]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:Traceback (most recent call last): [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default0]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:Traceback (most recent call last): [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_dataloader = build_pretraining_data_loader( [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:Traceback (most recent call last): [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: pretrain( [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 178402224, 12547659 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 178402224, 12547659 [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 178402224, 12547659 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 178402224, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 178402224, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 178402224, 12547659 WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3116743 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4011269 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2734779 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1416453 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3078886 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2269269) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1988384) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3689948) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3137952) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2059516) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1648673) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3704955) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3249428) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 609585) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2325224) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2767137) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3749387) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 610346) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3028998) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1024911) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1812596) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1875247) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1677170) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2077341) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 345304) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3733051) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2068192) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3881191) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1470860) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 504625) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3116742) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 516777) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4011267) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 466991) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 1416454) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2113054) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1540512) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 3078887) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1897046) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 2734780) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4072084) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main exec(code, run_globals) return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return _run_code(code, main_globals, None, main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return _run_code(code, main_globals, None, main() Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main raise ChildFailedError( return _run_code(code, main_globals, None, return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam40-ib0 rank : 218 (local_rank: 2) exitcode : 1 (pid: 1416455) error_file: /tmp/torchelastic_kfajjvhp/none_zt8p4b37/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return _run_code(code, main_globals, None, run(args) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam40-ib0 rank : 219 (local_rank: 3) exitcode : 1 (pid: 1416456) error_file: /tmp/torchelastic_kfajjvhp/none_zt8p4b37/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam40-ib0 rank : 220 (local_rank: 4) exitcode : 1 (pid: 1416457) error_file: /tmp/torchelastic_kfajjvhp/none_zt8p4b37/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam40-ib0 rank : 221 (local_rank: 5) exitcode : 1 (pid: 1416458) error_file: /tmp/torchelastic_kfajjvhp/none_zt8p4b37/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam40-ib0 rank : 222 (local_rank: 6) exitcode : 1 (pid: 1416459) error_file: /tmp/torchelastic_kfajjvhp/none_zt8p4b37/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam40-ib0 rank : 223 (local_rank: 7) exitcode : 1 (pid: 1416460) error_file: /tmp/torchelastic_kfajjvhp/none_zt8p4b37/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam40-ib0 rank : 217 (local_rank: 1) exitcode : 1 (pid: 1416454) error_file: /tmp/torchelastic_kfajjvhp/none_zt8p4b37/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( exec(code, run_globals) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper elastic_launch( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return f(*args, **kwargs) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main elastic_launch( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return launch_agent(self._config, self._entrypoint, list(args)) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run raise ChildFailedError( main() return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return launch_agent(self._config, self._entrypoint, list(args)) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam27-ib0 rank : 121 (local_rank: 1) exitcode : 1 (pid: 345305) error_file: /tmp/torchelastic_c5__6k_a/none_fgdnrkk9/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run exec(code, run_globals) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return launch_agent(self._config, self._entrypoint, list(args)) [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam27-ib0 rank : 122 (local_rank: 2) exitcode : 1 (pid: 345306) error_file: /tmp/torchelastic_c5__6k_a/none_fgdnrkk9/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam27-ib0 rank : 123 (local_rank: 3) exitcode : 1 (pid: 345307) error_file: /tmp/torchelastic_c5__6k_a/none_fgdnrkk9/attempt_0/3/error.json traceback : Traceback (most recent call last): return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam27-ib0 rank : 124 (local_rank: 4) exitcode : 1 (pid: 345308) error_file: /tmp/torchelastic_c5__6k_a/none_fgdnrkk9/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam27-ib0 rank : 125 (local_rank: 5) exitcode : 1 (pid: 345309) error_file: /tmp/torchelastic_c5__6k_a/none_fgdnrkk9/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam33-ib0 rank : 161 (local_rank: 1) exitcode : 1 (pid: 466992) error_file: /tmp/torchelastic_3656aus9/none_hi53hess/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader raise ChildFailedError( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam27-ib0 rank : 126 (local_rank: 6) exitcode : 1 (pid: 345310) error_file: /tmp/torchelastic_c5__6k_a/none_fgdnrkk9/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 raise ChildFailedError( [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam27-ib0 rank : 127 (local_rank: 7) exitcode : 1 (pid: 345311) error_file: /tmp/torchelastic_c5__6k_a/none_fgdnrkk9/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam27-ib0 rank : 120 (local_rank: 0) exitcode : 1 (pid: 345304) error_file: /tmp/torchelastic_c5__6k_a/none_fgdnrkk9/attempt_0/0/error.json traceback : Traceback (most recent call last): elastic_launch( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam33-ib0 rank : 162 (local_rank: 2) exitcode : 1 (pid: 466993) error_file: /tmp/torchelastic_3656aus9/none_hi53hess/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam06-ib0 rank : 33 (local_rank: 1) exitcode : 1 (pid: 3749388) error_file: /tmp/torchelastic_8yrxwo5x/none_8u7djo6q/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam33-ib0 rank : 163 (local_rank: 3) exitcode : 1 (pid: 466994) error_file: /tmp/torchelastic_3656aus9/none_hi53hess/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam11-ib0 rank : 65 (local_rank: 1) exitcode : 1 (pid: 2068193) error_file: /tmp/torchelastic__gzey3u6/none_uplcp98f/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( elastic_launch( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam33-ib0 rank : 164 (local_rank: 4) exitcode : 1 (pid: 466995) error_file: /tmp/torchelastic_3656aus9/none_hi53hess/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam06-ib0 rank : 34 (local_rank: 2) exitcode : 1 (pid: 3749389) error_file: /tmp/torchelastic_8yrxwo5x/none_8u7djo6q/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam06-ib0 rank : 35 (local_rank: 3) exitcode : 1 (pid: 3749390) error_file: /tmp/torchelastic_8yrxwo5x/none_8u7djo6q/attempt_0/3/error.json traceback : Traceback (most recent call last): [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam11-ib0 rank : 66 (local_rank: 2) exitcode : 1 (pid: 2068194) error_file: /tmp/torchelastic__gzey3u6/none_uplcp98f/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam33-ib0 rank : 165 (local_rank: 5) exitcode : 1 (pid: 466996) error_file: /tmp/torchelastic_3656aus9/none_hi53hess/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam11-ib0 rank : 67 (local_rank: 3) exitcode : 1 (pid: 2068195) error_file: /tmp/torchelastic__gzey3u6/none_uplcp98f/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam33-ib0 rank : 166 (local_rank: 6) exitcode : 1 (pid: 466997) error_file: /tmp/torchelastic_3656aus9/none_hi53hess/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam06-ib0 rank : 36 (local_rank: 4) exitcode : 1 (pid: 3749391) error_file: /tmp/torchelastic_8yrxwo5x/none_8u7djo6q/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam41-ib0 rank : 225 (local_rank: 1) exitcode : 1 (pid: 2767138) error_file: /tmp/torchelastic_q79kzr95/none_zwmyzwhu/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam15-ib0 rank : 89 (local_rank: 1) exitcode : 1 (pid: 2269270) error_file: /tmp/torchelastic_vb6q0z_i/none_rt2qewax/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam11-ib0 rank : 68 (local_rank: 4) exitcode : 1 (pid: 2068196) error_file: /tmp/torchelastic__gzey3u6/none_uplcp98f/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam33-ib0 rank : 167 (local_rank: 7) exitcode : 1 (pid: 466998) error_file: /tmp/torchelastic_3656aus9/none_hi53hess/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam06-ib0 rank : 37 (local_rank: 5) exitcode : 1 (pid: 3749392) error_file: /tmp/torchelastic_8yrxwo5x/none_8u7djo6q/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam33-ib0 rank : 160 (local_rank: 0) exitcode : 1 (pid: 466991) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam11-ib0 rank : 69 (local_rank: 5) exitcode : 1 (pid: 2068197) error_file: /tmp/torchelastic__gzey3u6/none_uplcp98f/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( error_file: /tmp/torchelastic_3656aus9/none_hi53hess/attempt_0/0/error.json traceback : Traceback (most recent call last): batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam06-ib0 rank : 38 (local_rank: 6) exitcode : 1 (pid: 3749393) error_file: /tmp/torchelastic_8yrxwo5x/none_8u7djo6q/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam41-ib0 rank : 226 (local_rank: 2) exitcode : 1 (pid: 2767139) error_file: /tmp/torchelastic_q79kzr95/none_zwmyzwhu/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam15-ib0 rank : 90 (local_rank: 2) exitcode : 1 (pid: 2269271) error_file: /tmp/torchelastic_vb6q0z_i/none_rt2qewax/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam18-ib0 rank : 98 (local_rank: 2) exitcode : 1 (pid: 2734781) error_file: /tmp/torchelastic_jaffk9m_/none_fr0giduh/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam41-ib0 rank : 227 (local_rank: 3) exitcode : 1 (pid: 2767140) error_file: /tmp/torchelastic_q79kzr95/none_zwmyzwhu/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam15-ib0 rank : 91 (local_rank: 3) exitcode : 1 (pid: 2269272) error_file: /tmp/torchelastic_vb6q0z_i/none_rt2qewax/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam11-ib0 rank : 70 (local_rank: 6) exitcode : 1 (pid: 2068198) error_file: /tmp/torchelastic__gzey3u6/none_uplcp98f/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ return launch_agent(self._config, self._entrypoint, list(args)) [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam06-ib0 rank : 39 (local_rank: 7) exitcode : 1 (pid: 3749394) error_file: /tmp/torchelastic_8yrxwo5x/none_8u7djo6q/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam18-ib0 rank : 99 (local_rank: 3) exitcode : 1 (pid: 2734782) error_file: /tmp/torchelastic_jaffk9m_/none_fr0giduh/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam06-ib0 rank : 32 (local_rank: 0) exitcode : 1 (pid: 3749387) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam18-ib0 rank : 100 (local_rank: 4) exitcode : 1 (pid: 2734783) error_file: /tmp/torchelastic_jaffk9m_/none_fr0giduh/attempt_0/4/error.json traceback : Traceback (most recent call last): [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam11-ib0 rank : 71 (local_rank: 7) exitcode : 1 (pid: 2068199) error_file: /tmp/torchelastic__gzey3u6/none_uplcp98f/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent error_file: /tmp/torchelastic_8yrxwo5x/none_8u7djo6q/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam41-ib0 rank : 228 (local_rank: 4) exitcode : 1 (pid: 2767141) error_file: /tmp/torchelastic_q79kzr95/none_zwmyzwhu/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam15-ib0 rank : 92 (local_rank: 4) exitcode : 1 (pid: 2269273) error_file: /tmp/torchelastic_vb6q0z_i/none_rt2qewax/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam11-ib0 rank : 64 (local_rank: 0) exitcode : 1 (pid: 2068192) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( error_file: /tmp/torchelastic__gzey3u6/none_uplcp98f/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam41-ib0 rank : 229 (local_rank: 5) exitcode : 1 (pid: 2767142) error_file: /tmp/torchelastic_q79kzr95/none_zwmyzwhu/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam15-ib0 rank : 93 (local_rank: 5) exitcode : 1 (pid: 2269274) error_file: /tmp/torchelastic_vb6q0z_i/none_rt2qewax/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam18-ib0 rank : 101 (local_rank: 5) exitcode : 1 (pid: 2734784) error_file: /tmp/torchelastic_jaffk9m_/none_fr0giduh/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ raise ChildFailedError( raise ChildFailedError( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam41-ib0 rank : 230 (local_rank: 6) exitcode : 1 (pid: 2767143) error_file: /tmp/torchelastic_q79kzr95/none_zwmyzwhu/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam15-ib0 rank : 94 (local_rank: 6) exitcode : 1 (pid: 2269275) error_file: /tmp/torchelastic_vb6q0z_i/none_rt2qewax/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam18-ib0 rank : 102 (local_rank: 6) exitcode : 1 (pid: 2734785) error_file: /tmp/torchelastic_jaffk9m_/none_fr0giduh/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam41-ib0 rank : 231 (local_rank: 7) exitcode : 1 (pid: 2767144) error_file: /tmp/torchelastic_q79kzr95/none_zwmyzwhu/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam15-ib0 rank : 95 (local_rank: 7) exitcode : 1 (pid: 2269276) error_file: /tmp/torchelastic_vb6q0z_i/none_rt2qewax/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam18-ib0 rank : 103 (local_rank: 7) exitcode : 1 (pid: 2734786) error_file: /tmp/torchelastic_jaffk9m_/none_fr0giduh/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam41-ib0 rank : 224 (local_rank: 0) exitcode : 1 (pid: 2767137) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam15-ib0 rank : 88 (local_rank: 0) exitcode : 1 (pid: 2269269) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam52-ib0 rank : 281 (local_rank: 1) exitcode : 1 (pid: 1875248) error_file: /tmp/torchelastic_mj4ro0vj/none_ukf1skt1/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent error_file: /tmp/torchelastic_q79kzr95/none_zwmyzwhu/attempt_0/0/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_vb6q0z_i/none_rt2qewax/attempt_0/0/error.json traceback : Traceback (most recent call last): ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam18-ib0 rank : 97 (local_rank: 1) exitcode : 1 (pid: 2734780) error_file: /tmp/torchelastic_jaffk9m_/none_fr0giduh/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam19-ib0 rank : 105 (local_rank: 1) exitcode : 1 (pid: 1540513) error_file: /tmp/torchelastic_y2k47xui/none_s6s8q8ki/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam36-ib0 rank : 185 (local_rank: 1) exitcode : 1 (pid: 1897047) error_file: /tmp/torchelastic_xgfwrjmt/none_124riu04/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam52-ib0 rank : 282 (local_rank: 2) exitcode : 1 (pid: 1875249) error_file: /tmp/torchelastic_mj4ro0vj/none_ukf1skt1/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam52-ib0 rank : 283 (local_rank: 3) exitcode : 1 (pid: 1875250) error_file: /tmp/torchelastic_mj4ro0vj/none_ukf1skt1/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam19-ib0 rank : 106 (local_rank: 2) exitcode : 1 (pid: 1540514) error_file: /tmp/torchelastic_y2k47xui/none_s6s8q8ki/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam19-ib0 rank : 107 (local_rank: 3) exitcode : 1 (pid: 1540515) error_file: /tmp/torchelastic_y2k47xui/none_s6s8q8ki/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam52-ib0 rank : 284 (local_rank: 4) exitcode : 1 (pid: 1875251) error_file: /tmp/torchelastic_mj4ro0vj/none_ukf1skt1/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) raise ChildFailedError( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam36-ib0 rank : 186 (local_rank: 2) exitcode : 1 (pid: 1897048) error_file: /tmp/torchelastic_xgfwrjmt/none_124riu04/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam36-ib0 rank : 187 (local_rank: 3) exitcode : 1 (pid: 1897049) error_file: /tmp/torchelastic_xgfwrjmt/none_124riu04/attempt_0/3/error.json traceback : Traceback (most recent call last): [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam52-ib0 rank : 285 (local_rank: 5) exitcode : 1 (pid: 1875252) error_file: /tmp/torchelastic_mj4ro0vj/none_ukf1skt1/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam19-ib0 rank : 108 (local_rank: 4) exitcode : 1 (pid: 1540516) error_file: /tmp/torchelastic_y2k47xui/none_s6s8q8ki/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam52-ib0 rank : 286 (local_rank: 6) exitcode : 1 (pid: 1875253) error_file: /tmp/torchelastic_mj4ro0vj/none_ukf1skt1/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam19-ib0 rank : 109 (local_rank: 5) exitcode : 1 (pid: 1540517) error_file: /tmp/torchelastic_y2k47xui/none_s6s8q8ki/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam36-ib0 rank : 188 (local_rank: 4) exitcode : 1 (pid: 1897050) error_file: /tmp/torchelastic_xgfwrjmt/none_124riu04/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam39-ib0 rank : 209 (local_rank: 1) exitcode : 1 (pid: 1470861) error_file: /tmp/torchelastic_qhlzdi56/none_65pdl4d4/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam52-ib0 rank : 287 (local_rank: 7) exitcode : 1 (pid: 1875254) error_file: /tmp/torchelastic_mj4ro0vj/none_ukf1skt1/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam19-ib0 rank : 110 (local_rank: 6) exitcode : 1 (pid: 1540518) error_file: /tmp/torchelastic_y2k47xui/none_s6s8q8ki/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam36-ib0 rank : 189 (local_rank: 5) exitcode : 1 (pid: 1897051) error_file: /tmp/torchelastic_xgfwrjmt/none_124riu04/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam52-ib0 rank : 280 (local_rank: 0) exitcode : 1 (pid: 1875247) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader error_file: /tmp/torchelastic_mj4ro0vj/none_ukf1skt1/attempt_0/0/error.json traceback : Traceback (most recent call last): main() [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam19-ib0 rank : 111 (local_rank: 7) exitcode : 1 (pid: 1540519) error_file: /tmp/torchelastic_y2k47xui/none_s6s8q8ki/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( return launch_agent(self._config, self._entrypoint, list(args)) batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam36-ib0 rank : 190 (local_rank: 6) exitcode : 1 (pid: 1897052) error_file: /tmp/torchelastic_xgfwrjmt/none_124riu04/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam19-ib0 rank : 104 (local_rank: 0) exitcode : 1 (pid: 1540512) [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam39-ib0 rank : 210 (local_rank: 2) exitcode : 1 (pid: 1470862) error_file: /tmp/torchelastic_qhlzdi56/none_65pdl4d4/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ error_file: /tmp/torchelastic_y2k47xui/none_s6s8q8ki/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam39-ib0 rank : 211 (local_rank: 3) exitcode : 1 (pid: 1470863) error_file: /tmp/torchelastic_qhlzdi56/none_65pdl4d4/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam36-ib0 rank : 191 (local_rank: 7) exitcode : 1 (pid: 1897053) error_file: /tmp/torchelastic_xgfwrjmt/none_124riu04/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam36-ib0 rank : 184 (local_rank: 0) exitcode : 1 (pid: 1897046) return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( error_file: /tmp/torchelastic_xgfwrjmt/none_124riu04/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam39-ib0 rank : 212 (local_rank: 4) exitcode : 1 (pid: 1470864) error_file: /tmp/torchelastic_qhlzdi56/none_65pdl4d4/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam39-ib0 rank : 213 (local_rank: 5) exitcode : 1 (pid: 1470865) error_file: /tmp/torchelastic_qhlzdi56/none_65pdl4d4/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader raise ChildFailedError( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam39-ib0 rank : 214 (local_rank: 6) exitcode : 1 (pid: 1470866) error_file: /tmp/torchelastic_qhlzdi56/none_65pdl4d4/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 exec(code, run_globals) [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam39-ib0 rank : 215 (local_rank: 7) exitcode : 1 (pid: 1470867) error_file: /tmp/torchelastic_qhlzdi56/none_65pdl4d4/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam39-ib0 rank : 208 (local_rank: 0) exitcode : 1 (pid: 1470860) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam09-ib0 rank : 57 (local_rank: 1) exitcode : 1 (pid: 2113055) error_file: /tmp/torchelastic_ysg9_e4z/none_4kf12giz/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic_qhlzdi56/none_65pdl4d4/attempt_0/0/error.json traceback : Traceback (most recent call last): torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam35-ib0 rank : 177 (local_rank: 1) exitcode : 1 (pid: 1648674) error_file: /tmp/torchelastic_1nbbd9id/none_358vozfv/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam07-ib0 rank : 41 (local_rank: 1) exitcode : 1 (pid: 4072085) error_file: /tmp/torchelastic_n_cyvzf0/none_u5tbegkv/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam09-ib0 rank : 58 (local_rank: 2) exitcode : 1 (pid: 2113056) error_file: /tmp/torchelastic_ysg9_e4z/none_4kf12giz/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam09-ib0 rank : 59 (local_rank: 3) exitcode : 1 (pid: 2113057) error_file: /tmp/torchelastic_ysg9_e4z/none_4kf12giz/attempt_0/3/error.json traceback : Traceback (most recent call last): [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam35-ib0 rank : 178 (local_rank: 2) exitcode : 1 (pid: 1648675) error_file: /tmp/torchelastic_1nbbd9id/none_358vozfv/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam07-ib0 rank : 42 (local_rank: 2) exitcode : 1 (pid: 4072086) error_file: /tmp/torchelastic_n_cyvzf0/none_u5tbegkv/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam35-ib0 rank : 179 (local_rank: 3) exitcode : 1 (pid: 1648676) error_file: /tmp/torchelastic_1nbbd9id/none_358vozfv/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam07-ib0 rank : 43 (local_rank: 3) exitcode : 1 (pid: 4072087) error_file: /tmp/torchelastic_n_cyvzf0/none_u5tbegkv/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam09-ib0 rank : 60 (local_rank: 4) exitcode : 1 (pid: 2113058) error_file: /tmp/torchelastic_ysg9_e4z/none_4kf12giz/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam09-ib0 rank : 61 (local_rank: 5) exitcode : 1 (pid: 2113059) error_file: /tmp/torchelastic_ysg9_e4z/none_4kf12giz/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam35-ib0 rank : 180 (local_rank: 4) exitcode : 1 (pid: 1648677) error_file: /tmp/torchelastic_1nbbd9id/none_358vozfv/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam07-ib0 rank : 44 (local_rank: 4) exitcode : 1 (pid: 4072088) error_file: /tmp/torchelastic_n_cyvzf0/none_u5tbegkv/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam09-ib0 rank : 62 (local_rank: 6) exitcode : 1 (pid: 2113060) error_file: /tmp/torchelastic_ysg9_e4z/none_4kf12giz/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam35-ib0 rank : 181 (local_rank: 5) exitcode : 1 (pid: 1648678) error_file: /tmp/torchelastic_1nbbd9id/none_358vozfv/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam07-ib0 rank : 45 (local_rank: 5) exitcode : 1 (pid: 4072089) error_file: /tmp/torchelastic_n_cyvzf0/none_u5tbegkv/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam09-ib0 rank : 63 (local_rank: 7) exitcode : 1 (pid: 2113061) error_file: /tmp/torchelastic_ysg9_e4z/none_4kf12giz/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam35-ib0 rank : 182 (local_rank: 6) exitcode : 1 (pid: 1648679) error_file: /tmp/torchelastic_1nbbd9id/none_358vozfv/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam09-ib0 rank : 56 (local_rank: 0) exitcode : 1 (pid: 2113054) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam07-ib0 rank : 46 (local_rank: 6) exitcode : 1 (pid: 4072090) error_file: /tmp/torchelastic_n_cyvzf0/none_u5tbegkv/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic_ysg9_e4z/none_4kf12giz/attempt_0/0/error.json traceback : Traceback (most recent call last): [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam35-ib0 rank : 183 (local_rank: 7) exitcode : 1 (pid: 1648680) error_file: /tmp/torchelastic_1nbbd9id/none_358vozfv/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam35-ib0 rank : 176 (local_rank: 0) exitcode : 1 (pid: 1648673) [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam07-ib0 rank : 47 (local_rank: 7) exitcode : 1 (pid: 4072091) error_file: /tmp/torchelastic_n_cyvzf0/none_u5tbegkv/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ error_file: /tmp/torchelastic_1nbbd9id/none_358vozfv/attempt_0/0/error.json traceback : Traceback (most recent call last): return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam07-ib0 rank : 40 (local_rank: 0) exitcode : 1 (pid: 4072084) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( error_file: /tmp/torchelastic_n_cyvzf0/none_u5tbegkv/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam04-ib0 rank : 17 (local_rank: 1) exitcode : 1 (pid: 2077342) error_file: /tmp/torchelastic_d4awin1v/none_wk5rlcuv/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam04-ib0 rank : 18 (local_rank: 2) exitcode : 1 (pid: 2077343) error_file: /tmp/torchelastic_d4awin1v/none_wk5rlcuv/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam04-ib0 rank : 19 (local_rank: 3) exitcode : 1 (pid: 2077344) error_file: /tmp/torchelastic_d4awin1v/none_wk5rlcuv/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam04-ib0 rank : 20 (local_rank: 4) exitcode : 1 (pid: 2077345) error_file: /tmp/torchelastic_d4awin1v/none_wk5rlcuv/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam04-ib0 rank : 21 (local_rank: 5) exitcode : 1 (pid: 2077346) error_file: /tmp/torchelastic_d4awin1v/none_wk5rlcuv/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam04-ib0 rank : 22 (local_rank: 6) exitcode : 1 (pid: 2077347) error_file: /tmp/torchelastic_d4awin1v/none_wk5rlcuv/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam04-ib0 rank : 23 (local_rank: 7) exitcode : 1 (pid: 2077348) error_file: /tmp/torchelastic_d4awin1v/none_wk5rlcuv/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam04-ib0 rank : 16 (local_rank: 0) exitcode : 1 (pid: 2077341) error_file: /tmp/torchelastic_d4awin1v/none_wk5rlcuv/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam37-ib0 rank : 193 (local_rank: 1) exitcode : 1 (pid: 3249429) error_file: /tmp/torchelastic_10t3a3rx/none_rhvfv2g1/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam37-ib0 rank : 194 (local_rank: 2) exitcode : 1 (pid: 3249430) error_file: /tmp/torchelastic_10t3a3rx/none_rhvfv2g1/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam37-ib0 rank : 195 (local_rank: 3) exitcode : 1 (pid: 3249431) error_file: /tmp/torchelastic_10t3a3rx/none_rhvfv2g1/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam37-ib0 rank : 196 (local_rank: 4) exitcode : 1 (pid: 3249432) error_file: /tmp/torchelastic_10t3a3rx/none_rhvfv2g1/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam37-ib0 rank : 197 (local_rank: 5) exitcode : 1 (pid: 3249433) error_file: /tmp/torchelastic_10t3a3rx/none_rhvfv2g1/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam37-ib0 rank : 198 (local_rank: 6) exitcode : 1 (pid: 3249434) error_file: /tmp/torchelastic_10t3a3rx/none_rhvfv2g1/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam37-ib0 rank : 199 (local_rank: 7) exitcode : 1 (pid: 3249435) error_file: /tmp/torchelastic_10t3a3rx/none_rhvfv2g1/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam37-ib0 rank : 192 (local_rank: 0) exitcode : 1 (pid: 3249428) error_file: /tmp/torchelastic_10t3a3rx/none_rhvfv2g1/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main raise ChildFailedError( Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam02-ib0 rank : 1 (local_rank: 1) exitcode : 1 (pid: 3733052) error_file: /tmp/torchelastic_w9xid6uh/none_v49sr7oi/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam02-ib0 rank : 2 (local_rank: 2) exitcode : 1 (pid: 3733053) error_file: /tmp/torchelastic_w9xid6uh/none_v49sr7oi/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam02-ib0 rank : 3 (local_rank: 3) exitcode : 1 (pid: 3733054) error_file: /tmp/torchelastic_w9xid6uh/none_v49sr7oi/attempt_0/3/error.json traceback : Traceback (most recent call last): exec(code, run_globals) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam02-ib0 rank : 4 (local_rank: 4) exitcode : 1 (pid: 3733055) error_file: /tmp/torchelastic_w9xid6uh/none_v49sr7oi/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 return _run_code(code, main_globals, None, [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam02-ib0 rank : 5 (local_rank: 5) exitcode : 1 (pid: 3733056) error_file: /tmp/torchelastic_w9xid6uh/none_v49sr7oi/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, main() exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam02-ib0 rank : 6 (local_rank: 6) exitcode : 1 (pid: 3733057) error_file: /tmp/torchelastic_w9xid6uh/none_v49sr7oi/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam02-ib0 rank : 7 (local_rank: 7) exitcode : 1 (pid: 3733058) error_file: /tmp/torchelastic_w9xid6uh/none_v49sr7oi/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam02-ib0 rank : 0 (local_rank: 0) exitcode : 1 (pid: 3733051) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in error_file: /tmp/torchelastic_w9xid6uh/none_v49sr7oi/attempt_0/0/error.json traceback : Traceback (most recent call last): exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return _run_code(code, main_globals, None, main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return launch_agent(self._config, self._entrypoint, list(args)) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ raise ChildFailedError( raise ChildFailedError( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam32-ib0 rank : 153 (local_rank: 1) exitcode : 1 (pid: 609586) error_file: /tmp/torchelastic_pou3k6nx/none_1c2me9wd/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam26-ib0 rank : 113 (local_rank: 1) exitcode : 1 (pid: 516778) error_file: /tmp/torchelastic_7m0k_cvt/none_3wrv05u7/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam32-ib0 rank : 154 (local_rank: 2) exitcode : 1 (pid: 609587) error_file: /tmp/torchelastic_pou3k6nx/none_1c2me9wd/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam26-ib0 rank : 114 (local_rank: 2) exitcode : 1 (pid: 516779) error_file: /tmp/torchelastic_7m0k_cvt/none_3wrv05u7/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam32-ib0 rank : 155 (local_rank: 3) exitcode : 1 (pid: 609588) error_file: /tmp/torchelastic_pou3k6nx/none_1c2me9wd/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam26-ib0 rank : 115 (local_rank: 3) exitcode : 1 (pid: 516780) error_file: /tmp/torchelastic_7m0k_cvt/none_3wrv05u7/attempt_0/3/error.json traceback : Traceback (most recent call last): return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( elastic_launch( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ raise ChildFailedError( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam32-ib0 rank : 156 (local_rank: 4) exitcode : 1 (pid: 609589) error_file: /tmp/torchelastic_pou3k6nx/none_1c2me9wd/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam26-ib0 rank : 116 (local_rank: 4) exitcode : 1 (pid: 516781) error_file: /tmp/torchelastic_7m0k_cvt/none_3wrv05u7/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 raise ChildFailedError( raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam28-ib0 rank : 129 (local_rank: 1) exitcode : 1 (pid: 3704956) error_file: /tmp/torchelastic_siyip2m8/none_nkbz7vkm/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam43-ib0 rank : 242 (local_rank: 2) exitcode : 1 (pid: 3078888) error_file: /tmp/torchelastic_7pkkks0e/none_r4tcfqn0/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam32-ib0 rank : 157 (local_rank: 5) exitcode : 1 (pid: 609590) error_file: /tmp/torchelastic_pou3k6nx/none_1c2me9wd/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam26-ib0 rank : 117 (local_rank: 5) exitcode : 1 (pid: 516782) error_file: /tmp/torchelastic_7m0k_cvt/none_3wrv05u7/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam42-ib0 rank : 233 (local_rank: 1) exitcode : 1 (pid: 3137953) error_file: /tmp/torchelastic_n0i1198l/none_q2eifbl6/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( return launch_agent(self._config, self._entrypoint, list(args)) raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam32-ib0 rank : 158 (local_rank: 6) exitcode : 1 (pid: 609591) error_file: /tmp/torchelastic_pou3k6nx/none_1c2me9wd/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam26-ib0 rank : 118 (local_rank: 6) exitcode : 1 (pid: 516783) error_file: /tmp/torchelastic_7m0k_cvt/none_3wrv05u7/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam38-ib0 rank : 201 (local_rank: 1) exitcode : 1 (pid: 3881192) error_file: /tmp/torchelastic_tjjbus70/none_tznibtzw/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam31-ib0 rank : 145 (local_rank: 1) exitcode : 1 (pid: 610347) error_file: /tmp/torchelastic_wtwepkk3/none_ny5uewaz/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam47-ib0 rank : 273 (local_rank: 1) exitcode : 1 (pid: 1024912) error_file: /tmp/torchelastic_xvrrdfx7/none_gzda0pji/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam32-ib0 rank : 159 (local_rank: 7) exitcode : 1 (pid: 609592) error_file: /tmp/torchelastic_pou3k6nx/none_1c2me9wd/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam26-ib0 rank : 119 (local_rank: 7) exitcode : 1 (pid: 516784) error_file: /tmp/torchelastic_7m0k_cvt/none_3wrv05u7/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam42-ib0 rank : 234 (local_rank: 2) exitcode : 1 (pid: 3137954) error_file: /tmp/torchelastic_n0i1198l/none_q2eifbl6/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam46-ib0 rank : 265 (local_rank: 1) exitcode : 1 (pid: 4011268) error_file: /tmp/torchelastic_i4qwt8rb/none_ip66frii/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam34-ib0 rank : 169 (local_rank: 1) exitcode : 1 (pid: 1812597) error_file: /tmp/torchelastic_scsjrguq/none_7zqxvvy6/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam43-ib0 rank : 243 (local_rank: 3) exitcode : 1 (pid: 3078889) error_file: /tmp/torchelastic_7pkkks0e/none_r4tcfqn0/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam32-ib0 rank : 152 (local_rank: 0) exitcode : 1 (pid: 609585) raise ChildFailedError( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam26-ib0 rank : 112 (local_rank: 0) exitcode : 1 (pid: 516777) raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 raise ChildFailedError( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam28-ib0 rank : 130 (local_rank: 2) exitcode : 1 (pid: 3704957) error_file: /tmp/torchelastic_siyip2m8/none_nkbz7vkm/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam43-ib0 rank : 244 (local_rank: 4) exitcode : 1 (pid: 3078890) error_file: /tmp/torchelastic_7pkkks0e/none_r4tcfqn0/attempt_0/4/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_pou3k6nx/none_1c2me9wd/attempt_0/0/error.json traceback : Traceback (most recent call last): return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent error_file: /tmp/torchelastic_7m0k_cvt/none_3wrv05u7/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam42-ib0 rank : 235 (local_rank: 3) exitcode : 1 (pid: 3137955) error_file: /tmp/torchelastic_n0i1198l/none_q2eifbl6/attempt_0/3/error.json traceback : Traceback (most recent call last): [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam31-ib0 rank : 146 (local_rank: 2) exitcode : 1 (pid: 610348) error_file: /tmp/torchelastic_wtwepkk3/none_ny5uewaz/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam47-ib0 rank : 274 (local_rank: 2) exitcode : 1 (pid: 1024913) error_file: /tmp/torchelastic_xvrrdfx7/none_gzda0pji/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam46-ib0 rank : 267 (local_rank: 3) exitcode : 1 (pid: 4011270) error_file: /tmp/torchelastic_i4qwt8rb/none_ip66frii/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam28-ib0 rank : 131 (local_rank: 3) exitcode : 1 (pid: 3704958) error_file: /tmp/torchelastic_siyip2m8/none_nkbz7vkm/attempt_0/3/error.json traceback : Traceback (most recent call last): torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam45-ib0 rank : 257 (local_rank: 1) exitcode : 1 (pid: 504626) error_file: /tmp/torchelastic_qinnvi03/none_oc51l2zn/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam34-ib0 rank : 170 (local_rank: 2) exitcode : 1 (pid: 1812598) error_file: /tmp/torchelastic_scsjrguq/none_7zqxvvy6/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam38-ib0 rank : 202 (local_rank: 2) exitcode : 1 (pid: 3881193) error_file: /tmp/torchelastic_tjjbus70/none_tznibtzw/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam13-ib0 rank : 73 (local_rank: 1) exitcode : 1 (pid: 2059517) error_file: /tmp/torchelastic_5e_7lrdf/none_69qgvl2f/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam31-ib0 rank : 147 (local_rank: 3) exitcode : 1 (pid: 610349) error_file: /tmp/torchelastic_wtwepkk3/none_ny5uewaz/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam47-ib0 rank : 275 (local_rank: 3) exitcode : 1 (pid: 1024914) error_file: /tmp/torchelastic_xvrrdfx7/none_gzda0pji/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam46-ib0 rank : 268 (local_rank: 4) exitcode : 1 (pid: 4011271) error_file: /tmp/torchelastic_i4qwt8rb/none_ip66frii/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam34-ib0 rank : 171 (local_rank: 3) exitcode : 1 (pid: 1812599) error_file: /tmp/torchelastic_scsjrguq/none_7zqxvvy6/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam38-ib0 rank : 203 (local_rank: 3) exitcode : 1 (pid: 3881194) error_file: /tmp/torchelastic_tjjbus70/none_tznibtzw/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam08-ib0 rank : 49 (local_rank: 1) exitcode : 1 (pid: 3028999) error_file: /tmp/torchelastic_1gpgieh2/none_7ftn1qym/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam14-ib0 rank : 81 (local_rank: 1) exitcode : 1 (pid: 2325225) error_file: /tmp/torchelastic_fxzdb0c3/none_ikmtcecd/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam03-ib0 rank : 9 (local_rank: 1) exitcode : 1 (pid: 1988385) error_file: /tmp/torchelastic_699wjn4z/none_cnz0ng68/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam30-ib0 rank : 137 (local_rank: 1) exitcode : 1 (pid: 3689949) error_file: /tmp/torchelastic_th6dmst7/none_ieow_gps/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam43-ib0 rank : 245 (local_rank: 5) exitcode : 1 (pid: 3078891) error_file: /tmp/torchelastic_7pkkks0e/none_r4tcfqn0/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam13-ib0 rank : 74 (local_rank: 2) exitcode : 1 (pid: 2059518) error_file: /tmp/torchelastic_5e_7lrdf/none_69qgvl2f/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam05-ib0 rank : 26 (local_rank: 2) exitcode : 1 (pid: 3116744) error_file: /tmp/torchelastic_4_kbjeyz/none_w133gg7l/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam42-ib0 rank : 236 (local_rank: 4) exitcode : 1 (pid: 3137956) error_file: /tmp/torchelastic_n0i1198l/none_q2eifbl6/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam28-ib0 rank : 132 (local_rank: 4) exitcode : 1 (pid: 3704959) error_file: /tmp/torchelastic_siyip2m8/none_nkbz7vkm/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam45-ib0 rank : 258 (local_rank: 2) exitcode : 1 (pid: 504627) error_file: /tmp/torchelastic_qinnvi03/none_oc51l2zn/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:19:22 host : jean-zay-iam44-ib0 rank : 249 (local_rank: 1) exitcode : 1 (pid: 1677171) error_file: /tmp/torchelastic_y9plye7r/none_8p6ilzjb/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam08-ib0 rank : 50 (local_rank: 2) exitcode : 1 (pid: 3029000) error_file: /tmp/torchelastic_1gpgieh2/none_7ftn1qym/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam14-ib0 rank : 82 (local_rank: 2) exitcode : 1 (pid: 2325226) error_file: /tmp/torchelastic_fxzdb0c3/none_ikmtcecd/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam13-ib0 rank : 75 (local_rank: 3) exitcode : 1 (pid: 2059519) error_file: /tmp/torchelastic_5e_7lrdf/none_69qgvl2f/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam03-ib0 rank : 10 (local_rank: 2) exitcode : 1 (pid: 1988386) error_file: /tmp/torchelastic_699wjn4z/none_cnz0ng68/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam31-ib0 rank : 148 (local_rank: 4) exitcode : 1 (pid: 610350) error_file: /tmp/torchelastic_wtwepkk3/none_ny5uewaz/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam47-ib0 rank : 276 (local_rank: 4) exitcode : 1 (pid: 1024915) error_file: /tmp/torchelastic_xvrrdfx7/none_gzda0pji/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam46-ib0 rank : 269 (local_rank: 5) exitcode : 1 (pid: 4011272) error_file: /tmp/torchelastic_i4qwt8rb/none_ip66frii/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam30-ib0 rank : 138 (local_rank: 2) exitcode : 1 (pid: 3689950) error_file: /tmp/torchelastic_th6dmst7/none_ieow_gps/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam45-ib0 rank : 259 (local_rank: 3) exitcode : 1 (pid: 504628) error_file: /tmp/torchelastic_qinnvi03/none_oc51l2zn/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam34-ib0 rank : 172 (local_rank: 4) exitcode : 1 (pid: 1812600) error_file: /tmp/torchelastic_scsjrguq/none_7zqxvvy6/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam38-ib0 rank : 204 (local_rank: 4) exitcode : 1 (pid: 3881195) error_file: /tmp/torchelastic_tjjbus70/none_tznibtzw/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam43-ib0 rank : 246 (local_rank: 6) exitcode : 1 (pid: 3078892) error_file: /tmp/torchelastic_7pkkks0e/none_r4tcfqn0/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam08-ib0 rank : 51 (local_rank: 3) exitcode : 1 (pid: 3029001) error_file: /tmp/torchelastic_1gpgieh2/none_7ftn1qym/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam14-ib0 rank : 83 (local_rank: 3) exitcode : 1 (pid: 2325227) error_file: /tmp/torchelastic_fxzdb0c3/none_ikmtcecd/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam05-ib0 rank : 27 (local_rank: 3) exitcode : 1 (pid: 3116745) error_file: /tmp/torchelastic_4_kbjeyz/none_w133gg7l/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam03-ib0 rank : 11 (local_rank: 3) exitcode : 1 (pid: 1988387) error_file: /tmp/torchelastic_699wjn4z/none_cnz0ng68/attempt_0/3/error.json traceback : Traceback (most recent call last): [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam42-ib0 rank : 237 (local_rank: 5) exitcode : 1 (pid: 3137957) error_file: /tmp/torchelastic_n0i1198l/none_q2eifbl6/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam30-ib0 rank : 139 (local_rank: 3) exitcode : 1 (pid: 3689951) error_file: /tmp/torchelastic_th6dmst7/none_ieow_gps/attempt_0/3/error.json traceback : Traceback (most recent call last): [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam28-ib0 rank : 133 (local_rank: 5) exitcode : 1 (pid: 3704960) error_file: /tmp/torchelastic_siyip2m8/none_nkbz7vkm/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [2]: time : 2022-09-05_14:19:22 host : jean-zay-iam44-ib0 rank : 250 (local_rank: 2) exitcode : 1 (pid: 1677172) error_file: /tmp/torchelastic_y9plye7r/none_8p6ilzjb/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam05-ib0 rank : 28 (local_rank: 4) exitcode : 1 (pid: 3116746) error_file: /tmp/torchelastic_4_kbjeyz/none_w133gg7l/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam31-ib0 rank : 149 (local_rank: 5) exitcode : 1 (pid: 610351) error_file: /tmp/torchelastic_wtwepkk3/none_ny5uewaz/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam47-ib0 rank : 277 (local_rank: 5) exitcode : 1 (pid: 1024916) error_file: /tmp/torchelastic_xvrrdfx7/none_gzda0pji/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam46-ib0 rank : 270 (local_rank: 6) exitcode : 1 (pid: 4011273) error_file: /tmp/torchelastic_i4qwt8rb/none_ip66frii/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam34-ib0 rank : 173 (local_rank: 5) exitcode : 1 (pid: 1812601) error_file: /tmp/torchelastic_scsjrguq/none_7zqxvvy6/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam38-ib0 rank : 205 (local_rank: 5) exitcode : 1 (pid: 3881196) error_file: /tmp/torchelastic_tjjbus70/none_tznibtzw/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam43-ib0 rank : 247 (local_rank: 7) exitcode : 1 (pid: 3078893) error_file: /tmp/torchelastic_7pkkks0e/none_r4tcfqn0/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [3]: time : 2022-09-05_14:19:22 host : jean-zay-iam44-ib0 rank : 251 (local_rank: 3) exitcode : 1 (pid: 1677173) error_file: /tmp/torchelastic_y9plye7r/none_8p6ilzjb/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam13-ib0 rank : 76 (local_rank: 4) exitcode : 1 (pid: 2059520) error_file: /tmp/torchelastic_5e_7lrdf/none_69qgvl2f/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam42-ib0 rank : 238 (local_rank: 6) exitcode : 1 (pid: 3137958) error_file: /tmp/torchelastic_n0i1198l/none_q2eifbl6/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam28-ib0 rank : 134 (local_rank: 6) exitcode : 1 (pid: 3704961) error_file: /tmp/torchelastic_siyip2m8/none_nkbz7vkm/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam45-ib0 rank : 260 (local_rank: 4) exitcode : 1 (pid: 504629) error_file: /tmp/torchelastic_qinnvi03/none_oc51l2zn/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam08-ib0 rank : 52 (local_rank: 4) exitcode : 1 (pid: 3029002) error_file: /tmp/torchelastic_1gpgieh2/none_7ftn1qym/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam14-ib0 rank : 84 (local_rank: 4) exitcode : 1 (pid: 2325228) error_file: /tmp/torchelastic_fxzdb0c3/none_ikmtcecd/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam03-ib0 rank : 12 (local_rank: 4) exitcode : 1 (pid: 1988388) error_file: /tmp/torchelastic_699wjn4z/none_cnz0ng68/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam31-ib0 rank : 150 (local_rank: 6) exitcode : 1 (pid: 610352) error_file: /tmp/torchelastic_wtwepkk3/none_ny5uewaz/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam47-ib0 rank : 278 (local_rank: 6) exitcode : 1 (pid: 1024917) error_file: /tmp/torchelastic_xvrrdfx7/none_gzda0pji/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam46-ib0 rank : 271 (local_rank: 7) exitcode : 1 (pid: 4011274) error_file: /tmp/torchelastic_i4qwt8rb/none_ip66frii/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam30-ib0 rank : 140 (local_rank: 4) exitcode : 1 (pid: 3689952) error_file: /tmp/torchelastic_th6dmst7/none_ieow_gps/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam34-ib0 rank : 174 (local_rank: 6) exitcode : 1 (pid: 1812602) error_file: /tmp/torchelastic_scsjrguq/none_7zqxvvy6/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam38-ib0 rank : 206 (local_rank: 6) exitcode : 1 (pid: 3881197) error_file: /tmp/torchelastic_tjjbus70/none_tznibtzw/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam43-ib0 rank : 241 (local_rank: 1) exitcode : 1 (pid: 3078887) error_file: /tmp/torchelastic_7pkkks0e/none_r4tcfqn0/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam13-ib0 rank : 77 (local_rank: 5) exitcode : 1 (pid: 2059521) error_file: /tmp/torchelastic_5e_7lrdf/none_69qgvl2f/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam05-ib0 rank : 29 (local_rank: 5) exitcode : 1 (pid: 3116747) error_file: /tmp/torchelastic_4_kbjeyz/none_w133gg7l/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam42-ib0 rank : 239 (local_rank: 7) exitcode : 1 (pid: 3137959) error_file: /tmp/torchelastic_n0i1198l/none_q2eifbl6/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam28-ib0 rank : 135 (local_rank: 7) exitcode : 1 (pid: 3704962) error_file: /tmp/torchelastic_siyip2m8/none_nkbz7vkm/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam45-ib0 rank : 261 (local_rank: 5) exitcode : 1 (pid: 504630) error_file: /tmp/torchelastic_qinnvi03/none_oc51l2zn/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [4]: time : 2022-09-05_14:19:22 host : jean-zay-iam44-ib0 rank : 252 (local_rank: 4) exitcode : 1 (pid: 1677174) error_file: /tmp/torchelastic_y9plye7r/none_8p6ilzjb/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam08-ib0 rank : 53 (local_rank: 5) exitcode : 1 (pid: 3029003) error_file: /tmp/torchelastic_1gpgieh2/none_7ftn1qym/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam14-ib0 rank : 85 (local_rank: 5) exitcode : 1 (pid: 2325229) error_file: /tmp/torchelastic_fxzdb0c3/none_ikmtcecd/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam03-ib0 rank : 13 (local_rank: 5) exitcode : 1 (pid: 1988389) error_file: /tmp/torchelastic_699wjn4z/none_cnz0ng68/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam42-ib0 rank : 232 (local_rank: 0) exitcode : 1 (pid: 3137952) [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam31-ib0 rank : 151 (local_rank: 7) exitcode : 1 (pid: 610353) error_file: /tmp/torchelastic_wtwepkk3/none_ny5uewaz/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam47-ib0 rank : 279 (local_rank: 7) exitcode : 1 (pid: 1024918) error_file: /tmp/torchelastic_xvrrdfx7/none_gzda0pji/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam46-ib0 rank : 264 (local_rank: 0) exitcode : 1 (pid: 4011267) error_file: /tmp/torchelastic_i4qwt8rb/none_ip66frii/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam30-ib0 rank : 141 (local_rank: 5) exitcode : 1 (pid: 3689953) error_file: /tmp/torchelastic_th6dmst7/none_ieow_gps/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam28-ib0 rank : 128 (local_rank: 0) exitcode : 1 (pid: 3704955) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam34-ib0 rank : 175 (local_rank: 7) exitcode : 1 (pid: 1812603) error_file: /tmp/torchelastic_scsjrguq/none_7zqxvvy6/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam38-ib0 rank : 207 (local_rank: 7) exitcode : 1 (pid: 3881198) error_file: /tmp/torchelastic_tjjbus70/none_tznibtzw/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam13-ib0 rank : 78 (local_rank: 6) exitcode : 1 (pid: 2059522) error_file: /tmp/torchelastic_5e_7lrdf/none_69qgvl2f/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam05-ib0 rank : 30 (local_rank: 6) exitcode : 1 (pid: 3116748) error_file: /tmp/torchelastic_4_kbjeyz/none_w133gg7l/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader error_file: /tmp/torchelastic_n0i1198l/none_q2eifbl6/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam31-ib0 rank : 144 (local_rank: 0) exitcode : 1 (pid: 610346) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam47-ib0 rank : 272 (local_rank: 0) exitcode : 1 (pid: 1024911) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader error_file: /tmp/torchelastic_siyip2m8/none_nkbz7vkm/attempt_0/0/error.json traceback : Traceback (most recent call last): batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam45-ib0 rank : 262 (local_rank: 6) exitcode : 1 (pid: 504631) error_file: /tmp/torchelastic_qinnvi03/none_oc51l2zn/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam34-ib0 rank : 168 (local_rank: 0) exitcode : 1 (pid: 1812596) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam38-ib0 rank : 200 (local_rank: 0) exitcode : 1 (pid: 3881191) [5]: time : 2022-09-05_14:19:22 host : jean-zay-iam44-ib0 rank : 253 (local_rank: 5) exitcode : 1 (pid: 1677175) error_file: /tmp/torchelastic_y9plye7r/none_8p6ilzjb/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam08-ib0 rank : 54 (local_rank: 6) exitcode : 1 (pid: 3029004) error_file: /tmp/torchelastic_1gpgieh2/none_7ftn1qym/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam14-ib0 rank : 86 (local_rank: 6) exitcode : 1 (pid: 2325230) error_file: /tmp/torchelastic_fxzdb0c3/none_ikmtcecd/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam03-ib0 rank : 14 (local_rank: 6) exitcode : 1 (pid: 1988390) error_file: /tmp/torchelastic_699wjn4z/none_cnz0ng68/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( error_file: /tmp/torchelastic_wtwepkk3/none_ny5uewaz/attempt_0/0/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_xvrrdfx7/none_gzda0pji/attempt_0/0/error.json traceback : Traceback (most recent call last): batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam30-ib0 rank : 142 (local_rank: 6) exitcode : 1 (pid: 3689954) error_file: /tmp/torchelastic_th6dmst7/none_ieow_gps/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 error_file: /tmp/torchelastic_scsjrguq/none_7zqxvvy6/attempt_0/0/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_tjjbus70/none_tznibtzw/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam13-ib0 rank : 79 (local_rank: 7) exitcode : 1 (pid: 2059523) error_file: /tmp/torchelastic_5e_7lrdf/none_69qgvl2f/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam05-ib0 rank : 31 (local_rank: 7) exitcode : 1 (pid: 3116749) error_file: /tmp/torchelastic_4_kbjeyz/none_w133gg7l/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam45-ib0 rank : 263 (local_rank: 7) exitcode : 1 (pid: 504632) error_file: /tmp/torchelastic_qinnvi03/none_oc51l2zn/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [6]: time : 2022-09-05_14:19:22 host : jean-zay-iam44-ib0 rank : 254 (local_rank: 6) exitcode : 1 (pid: 1677176) error_file: /tmp/torchelastic_y9plye7r/none_8p6ilzjb/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam08-ib0 rank : 55 (local_rank: 7) exitcode : 1 (pid: 3029005) error_file: /tmp/torchelastic_1gpgieh2/none_7ftn1qym/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam14-ib0 rank : 87 (local_rank: 7) exitcode : 1 (pid: 2325231) error_file: /tmp/torchelastic_fxzdb0c3/none_ikmtcecd/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam13-ib0 rank : 72 (local_rank: 0) exitcode : 1 (pid: 2059516) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam03-ib0 rank : 15 (local_rank: 7) exitcode : 1 (pid: 1988391) error_file: /tmp/torchelastic_699wjn4z/none_cnz0ng68/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam30-ib0 rank : 143 (local_rank: 7) exitcode : 1 (pid: 3689955) error_file: /tmp/torchelastic_th6dmst7/none_ieow_gps/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam45-ib0 rank : 256 (local_rank: 0) exitcode : 1 (pid: 504625) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam08-ib0 rank : 48 (local_rank: 0) exitcode : 1 (pid: 3028998) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam14-ib0 rank : 80 (local_rank: 0) exitcode : 1 (pid: 2325224) error_file: /tmp/torchelastic_5e_7lrdf/none_69qgvl2f/attempt_0/0/error.json traceback : Traceback (most recent call last): ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam05-ib0 rank : 24 (local_rank: 0) exitcode : 1 (pid: 3116742) error_file: /tmp/torchelastic_4_kbjeyz/none_w133gg7l/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam03-ib0 rank : 8 (local_rank: 0) exitcode : 1 (pid: 1988384) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam30-ib0 rank : 136 (local_rank: 0) exitcode : 1 (pid: 3689948) error_file: /tmp/torchelastic_qinnvi03/none_oc51l2zn/attempt_0/0/error.json traceback : Traceback (most recent call last): [7]: time : 2022-09-05_14:19:22 host : jean-zay-iam44-ib0 rank : 255 (local_rank: 7) exitcode : 1 (pid: 1677178) error_file: /tmp/torchelastic_y9plye7r/none_8p6ilzjb/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( error_file: /tmp/torchelastic_1gpgieh2/none_7ftn1qym/attempt_0/0/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_fxzdb0c3/none_ikmtcecd/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ error_file: /tmp/torchelastic_699wjn4z/none_cnz0ng68/attempt_0/0/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_th6dmst7/none_ieow_gps/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:19:22 host : jean-zay-iam44-ib0 rank : 248 (local_rank: 0) exitcode : 1 (pid: 1677170) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ error_file: /tmp/torchelastic_y9plye7r/none_8p6ilzjb/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 178402224, 12547659 ============================================================ srun: error: jean-zay-iam34: task 21: Exited with exit code 1 srun: launch/slurm: _step_signal: Terminating StepId=1006044.0 srun: error: jean-zay-iam42: task 29: Exited with exit code 1 srun: error: jean-zay-iam19: task 13: Exited with exit code 1 srun: error: jean-zay-iam15: task 11: Exited with exit code 1 srun: error: jean-zay-iam28: task 16: Exited with exit code 1 srun: error: jean-zay-iam09: task 7: Exited with exit code 1 srun: error: jean-zay-iam36: task 23: Exited with exit code 1 srun: error: jean-zay-iam44: task 31: Exited with exit code 1 srun: error: jean-zay-iam11: task 8: Exited with exit code 1 srun: error: jean-zay-iam05: task 3: Exited with exit code 1 srun: error: jean-zay-iam45: task 32: Exited with exit code 1 srun: error: jean-zay-iam33: task 20: Exited with exit code 1 srun: error: jean-zay-iam18: task 12: Exited with exit code 1 srun: error: jean-zay-iam27: task 15: Exited with exit code 1 srun: error: jean-zay-iam07: task 5: Exited with exit code 1 srun: error: jean-zay-iam14: task 10: Exited with exit code 1 srun: error: jean-zay-iam38: task 25: Exited with exit code 1 srun: error: jean-zay-iam31: task 18: Exited with exit code 1 srun: error: jean-zay-iam04: task 2: Exited with exit code 1 srun: error: jean-zay-iam46: task 33: Exited with exit code 1 srun: error: jean-zay-iam47: task 34: Exited with exit code 1 srun: error: jean-zay-iam32: task 19: Exited with exit code 1 srun: error: jean-zay-iam03: task 1: Exited with exit code 1 srun: error: jean-zay-iam08: task 6: Exited with exit code 1 srun: error: jean-zay-iam40: task 27: Exited with exit code 1 srun: error: jean-zay-iam41: task 28: Exited with exit code 1 srun: error: jean-zay-iam35: task 22: Exited with exit code 1 srun: error: jean-zay-iam02: task 0: Exited with exit code 1 srun: error: jean-zay-iam37: task 24: Exited with exit code 1 srun: error: jean-zay-iam13: task 9: Exited with exit code 1 srun: error: jean-zay-iam39: task 26: Exited with exit code 1 srun: error: jean-zay-iam43: task 30: Exited with exit code 1 srun: error: jean-zay-iam26: task 14: Exited with exit code 1 srun: error: jean-zay-iam30: task 17: Exited with exit code 1 srun: error: jean-zay-iam06: task 4: Exited with exit code 1 srun: error: jean-zay-iam52: task 35: Exited with exit code 1 WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** [default0]:using world size: 288, data-parallel-size: 4, tensor-model-parallel size: 1, pipeline-model-parallel size: 72 [default0]:accumulate and all-reduce gradients in fp32 for bfloat16 data type. [default0]:using torch.bfloat16 for parameters ... [default0]:------------------------ arguments ------------------------ [default0]: abort_on_unmet_fused_kernel_constraints ......... True [default0]: accumulate_allreduce_grads_in_fp32 .............. True [default0]: adam_beta1 ...................................... 0.9 [default0]: adam_beta2 ...................................... 0.95 [default0]: adam_eps ........................................ 1e-08 [default0]: adlr_autoresume ................................. False [default0]: adlr_autoresume_interval ........................ 1000 [default0]: apply_query_key_layer_scaling ................... True [default0]: apply_residual_connection_post_layernorm ........ False [default0]: attention_dropout ............................... 0.1 [default0]: attention_softmax_in_fp32 ....................... False [default0]: bert_binary_head ................................ True [default0]: bert_load ....................................... None [default0]: bf16 ............................................ True [default0]: bias_dropout_fusion ............................. True [default0]: bias_gelu_fusion ................................ True [default0]: biencoder_projection_dim ........................ 0 [default0]: biencoder_shared_query_context_model ............ False [default0]: block_data_path ................................. None [default0]: checkpoint_activations .......................... True [default0]: checkpoint_in_cpu ............................... False [default0]: checkpoint_num_layers ........................... 1 [default0]: clip_grad ....................................... 1.0 [default0]: codecarbon_dir .................................. None [default0]: consumed_train_samples .......................... 0 [default0]: consumed_train_tokens ........................... 0 [default0]: consumed_valid_samples .......................... 0 [default0]: contigious_checkpointing ........................ False [default0]: cpu_optimizer ................................... False [default0]: cpu_torch_adam .................................. False [default0]: curriculum_learning ............................. False [default0]: data_impl ....................................... mmap [default0]: data_parallel_size .............................. 4 [default0]: data_path ....................................... None [default0]: dataloader_type ................................. single [default0]: DDP_impl ........................................ local [default0]: decoder_seq_length .............................. None [default0]: deepscale ....................................... False [default0]: deepscale_config ................................ None [default0]: deepspeed ....................................... True [default0]: deepspeed_activation_checkpointing .............. True [default0]: deepspeed_config ................................ ./ds_config.1006357.json [default0]: deepspeed_mpi ................................... False [default0]: distribute_checkpointed_activations ............. False [default0]: distributed_backend ............................. nccl [default0]: embed_layernorm ................................. True [default0]: embedding_path .................................. None [default0]: encoder_seq_length .............................. 2048 [default0]: eod_mask_loss ................................... False [default0]: eval_interval ................................... 250 [default0]: eval_iters ...................................... 10 [default0]: eval_only ....................................... True [default0]: evidence_data_path .............................. None [default0]: exit_duration_in_mins ........................... 5990 [default0]: exit_interval ................................... None [default0]: ffn_hidden_size ................................. 57344 [default0]: finetune ........................................ False [default0]: fp16 ............................................ False [default0]: fp16_lm_cross_entropy ........................... False [default0]: fp32_residual_connection ........................ False [default0]: gigaflos_no_embeds .............................. 0 [default0]: global_batch_size ............................... 2048 [default0]: glu_activation .................................. None [default0]: hidden_dropout .................................. 0.1 [default0]: hidden_size ..................................... 14336 [default0]: hysteresis ...................................... 2 [default0]: ict_head_size ................................... None [default0]: ict_load ........................................ None [default0]: img_dim ......................................... 224 [default0]: indexer_batch_size .............................. 128 [default0]: indexer_log_interval ............................ 1000 [default0]: inference ....................................... False [default0]: init_method_std ................................. 0.0048 [default0]: init_method_xavier_uniform ...................... False [default0]: initial_loss_scale .............................. 4294967296 [default0]: kill_switch_path ................................ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/kill-switch-tr13-176B-mtf [default0]: kv_channels ..................................... 128 [default0]: layernorm_epsilon ............................... 1e-05 [default0]: lazy_mpu_init ................................... None [default0]: load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: local_rank ...................................... None [default0]: log_batch_size_to_tensorboard ................... True [default0]: log_interval .................................... 1 [default0]: log_learning_rate_to_tensorboard ................ True [default0]: log_level ....................................... None [default0]: log_level_replica ............................... None [default0]: log_loss_scale_to_tensorboard ................... True [default0]: log_num_zeros_in_grad ........................... False [default0]: log_params_norm ................................. False [default0]: log_path ........................................ None [default0]: log_timers_to_tensorboard ....................... True [default0]: log_validation_ppl_to_tensorboard ............... True [default0]: loss_on_targets_only ............................ False [default0]: loss_scale ...................................... None [default0]: loss_scale_window ............................... 1000 [default0]: lr .............................................. 2e-05 [default0]: lr_decay_iters .................................. None [default0]: lr_decay_samples ................................ None [default0]: lr_decay_style .................................. constant [default0]: lr_decay_tokens ................................. None [default0]: lr_warmup_fraction .............................. None [default0]: lr_warmup_iters ................................. 0 [default0]: lr_warmup_samples ............................... 0 [default0]: make_vocab_size_divisible_by .................... 128 [default0]: mask_prob ....................................... 0.15 [default0]: masked_softmax_fusion ........................... True [default0]: max_position_embeddings ......................... 2048 [default0]: mean_noise_span_length .......................... None [default0]: memory_centric_tiled_linear ..................... False [default0]: merge_file ...................................... None [default0]: micro_batch_size ................................ 1 [default0]: min_loss_scale .................................. 1.0 [default0]: min_lr .......................................... 0.0 [default0]: mmap_warmup ..................................... False [default0]: no_load_optim ................................... True [default0]: no_load_rng ..................................... None [default0]: no_save_optim ................................... None [default0]: no_save_rng ..................................... None [default0]: noise_density ................................... None [default0]: norm_target_loss ................................ True [default0]: num_attention_heads ............................. 112 [default0]: num_channels .................................... 3 [default0]: num_classes ..................................... 1000 [default0]: num_layers ...................................... 70 [default0]: num_layers_per_virtual_pipeline_stage ........... None [default0]: num_workers ..................................... 2 [default0]: onnx_safe ....................................... None [default0]: openai_gelu ..................................... False [default0]: optimizer ....................................... adam [default0]: override_lr_scheduler ........................... False [default0]: pad_vocab_size_to ............................... 250880 [default0]: params_dtype .................................... torch.bfloat16 [default0]: partition_activations ........................... False [default0]: patch_dim ....................................... 16 [default0]: pipeline_model_parallel_size .................... 72 [default0]: position_embedding_type ......................... PositionEmbeddingType.alibi [default0]: pp_partition_method ............................. type:transformer|embedding [default0]: prefixlm ........................................ False [default0]: profile_backward ................................ False [default0]: query_in_block_prob ............................. 0.1 [default0]: rampup_batch_size ............................... None [default0]: rank ............................................ 0 [default0]: remote_device ................................... none [default0]: reset_attention_mask ............................ False [default0]: reset_position_ids .............................. False [default0]: reset_progress .................................. None [default0]: retriever_report_topk_accuracies ................ [] [default0]: retriever_score_scaling ......................... False [default0]: retriever_seq_length ............................ 256 [default0]: reweight_loss_based_on_position_frequency ....... False [default0]: sample_rate ..................................... 1.0 [default0]: save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: save_interval ................................... 5 [default0]: scatter_gather_tensors_in_pipeline .............. True [default0]: scattered_embeddings ............................ False [default0]: seed ............................................ 42 [default0]: seq_length ...................................... 2048 [default0]: sgd_momentum .................................... 0.9 [default0]: short_seq_prob .................................. 0.1 [default0]: skip_train_iteration_range ...................... None [default0]: split ........................................... None [default0]: split_transformers .............................. False [default0]: sync_tp_duplicated_parameters ................... True [default0]: synchronize_each_layer .......................... False [default0]: tensor_model_parallel_size ...................... 1 [default0]: tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/tr13-176B-ml-t0-logs/tensorboard/p31lossseq [default0]: tensorboard_log_interval ........................ 1 [default0]: tensorboard_queue_size .......................... 5 [default0]: test_weighted_split_paths ....................... None [default0]: test_weighted_split_paths_path .................. None [default0]: tile_factor ..................................... 1 [default0]: titles_data_path ................................ None [default0]: tokenizer_name_or_path .......................... bigscience/tokenizer [default0]: tokenizer_type .................................. PretrainedFromHF [default0]: train_iters ..................................... None [default0]: train_samples ................................... 6348800 [default0]: train_tokens .................................... None [default0]: train_weighted_split_names ...................... ['train'] [default0]: train_weighted_split_paths ...................... [['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train']] [default0]: train_weighted_split_paths_path ................. None [default0]: train_weighted_split_splits ..................... [['0:1']] [default0]: train_weighted_split_weights .................... [['1']] [default0]: universal_checkpoint ............................ True [default0]: use_bnb_optimizer ............................... False [default0]: use_checkpoint_lr_scheduler ..................... False [default0]: use_contiguous_buffers_in_ddp ................... True [default0]: use_cpu_initialization .......................... None [default0]: use_one_sent_docs ............................... False [default0]: use_pin_memory .................................. False [default0]: valid_num_workers ............................... 2 [default0]: valid_weighted_split_names ...................... ['validation_pretraining', 'valid'] [default0]: valid_weighted_split_paths ...................... [['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation']] [default0]: valid_weighted_split_paths_path ................. None [default0]: valid_weighted_split_splits ..................... [['0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0'], ['0:1']] [default0]: valid_weighted_split_weights .................... [['0.0330676168743166', '0.011242051312222764', '0.13027200903379185', '0.22171164529099704', '0.10667815627928671', '0.0015595123898173287', '0.13054018439603915', '0.01091803753667153', '0.00011021422347108609', '0.005492381453597748', '0.0004021215011318779', '0.007470068593492175', '0.0006190467776576425', '0.0010335296343329384', '0.0005012010684646179', '0.0006672772956128299', '0.00035928138344705506', '0.0005084433130291778', '0.0021137328219915496', '0.0009129946225980253', '0.0012454301613725426', '0.00031588689199263235', '0.08137213783015229', '0.055293935695898196', '0.04954150576361177', '0.02461641286531197', '0.12091748245519074', '0.0005177025345001541'], ['1']] [default0]: virtual_pipeline_model_parallel_size ............ None [default0]: vocab_extra_ids ................................. 0 [default0]: vocab_file ...................................... None [default0]: weight_decay .................................... 0.0001 [default0]: world_size ...................................... 288 [default0]: zero_allgather_bucket_size ...................... 0.0 [default0]: zero_contigious_gradients ....................... False [default0]: zero_reduce_bucket_size ......................... 0.0 [default0]: zero_reduce_scatter ............................. False [default0]: zero_stage ...................................... 0 [default0]:-------------------- end of arguments --------------------- [default0]:setting number of micro-batches to constant 512 [default0]:> building PretrainedFromHF tokenizer ... [default0]: vocab file is un-used. loading tokenizer from pre-trained model [default0]:Offline mode: forcing local_files_only=True [default0]:Offline mode: forcing local_files_only=True [default7]:> setting tensorboard ... [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer.json from cache at /gpfswork/rech/six/commun/models/29d0a41f4527257b8afe6d5495f492dac260318430f18239a42ca5f6dc4487fc.7b0fb8edc2986944ff9b7418149b52d8c4a1354a17d0360deb8974da70c6cc03 [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/added_tokens.json from cache at None [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/special_tokens_map.json from cache at /gpfswork/rech/six/commun/models/4f03e43bcc54e0721823e6a06b1d197905e2ea79aa7dcc1a0f0fcecc73ce3fb2.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer_config.json from cache at /gpfswork/rech/six/commun/models/9441c67b923ef7a65950a64e31c40f80ed181ba59502981a80f2cd0c438c6432.3c09887250243e50d8de9d10b2a778152434f62a22a95b5f89dbbe79a6eb496a [default0]: > padded vocab (size: 250680) with 200 dummy tokens (new size: 250880) [default0]:DeepSpeed general environment info: [default0]:torch install path ............... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch'] [default0]:torch version .................... 1.12.0 [default0]:torch cuda version ............... 11.3 [default0]:torch hip version ................ None [default0]:nvcc version ..................... 11.4 [default0]:deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed'] [default0]:deepspeed info ................... 0.7.1+8b2a6371, 8b2a6371, master [default0]:deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3 [default0]:**** Git info for Megatron: git_hash=6c1018f git_branch=mtf-multival **** [default0]:> initializing torch distributed ... [default0]:[2022-09-05 14:21:30,008] [INFO] [comm.py:628:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [default0]:> initializing tensor model parallel with size 1 [default0]:> initializing pipeline model parallel with size 72 [default0]:> setting random seeds to 42 ... [default0]:[2022-09-05 14:21:36,157] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 [default0]:> compiling dataset index builder ... [default0]:make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:make: Nothing to be done for 'default'. [default0]:make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:>>> done with dataset index builder. Compilation time: 0.088 seconds [default0]:> compiling and loading fused kernels ... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module fused_mix_prec_layer_norm_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module fused_mix_prec_layer_norm_cuda... [default0]:>>> done with compiling and loading fused kernels. Compilation time: 7.111 seconds [default0]:time to initialize megatron (seconds): -40.642 [default0]:[after megatron is initialized] datetime: 2022-09-05 14:21:43 [default0]:building GPT model ... [default0]:[2022-09-05 14:21:43,403] [INFO] [utils.py:827:see_memory_usage] Before Building Model [default0]:[2022-09-05 14:21:43,404] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [default0]:[2022-09-05 14:21:43,404] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.1 GB, percent = 7.2% [default0]:SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None [default0]:Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=1, model=0): 5, ProcessCoord(pipe=1, data=2, model=0): 6, ProcessCoord(pipe=1, data=3, model=0): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=1, model=0): 9, ProcessCoord(pipe=2, data=2, model=0): 10, ProcessCoord(pipe=2, data=3, model=0): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=1, model=0): 13, ProcessCoord(pipe=3, data=2, model=0): 14, ProcessCoord(pipe=3, data=3, model=0): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=1, model=0): 17, ProcessCoord(pipe=4, data=2, model=0): 18, ProcessCoord(pipe=4, data=3, model=0): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=1, model=0): 21, ProcessCoord(pipe=5, data=2, model=0): 22, ProcessCoord(pipe=5, data=3, model=0): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=1, model=0): 25, ProcessCoord(pipe=6, data=2, model=0): 26, ProcessCoord(pipe=6, data=3, model=0): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=1, model=0): 29, ProcessCoord(pipe=7, data=2, model=0): 30, ProcessCoord(pipe=7, data=3, model=0): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=1, model=0): 33, ProcessCoord(pipe=8, data=2, model=0): 34, ProcessCoord(pipe=8, data=3, model=0): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=1, model=0): 37, ProcessCoord(pipe=9, data=2, model=0): 38, ProcessCoord(pipe=9, data=3, model=0): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=1, model=0): 41, ProcessCoord(pipe=10, data=2, model=0): 42, ProcessCoord(pipe=10, data=3, model=0): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=1, model=0): 45, ProcessCoord(pipe=11, data=2, model=0): 46, ProcessCoord(pipe=11, data=3, model=0): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=1, model=0): 49, ProcessCoord(pipe=12, data=2, model=0): 50, ProcessCoord(pipe=12, data=3, model=0): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=1, model=0): 53, ProcessCoord(pipe=13, data=2, model=0): 54, ProcessCoord(pipe=13, data=3, model=0): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=1, model=0): 57, ProcessCoord(pipe=14, data=2, model=0): 58, ProcessCoord(pipe=14, data=3, model=0): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=1, model=0): 61, ProcessCoord(pipe=15, data=2, model=0): 62, ProcessCoord(pipe=15, data=3, model=0): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=1, model=0): 65, ProcessCoord(pipe=16, data=2, model=0): 66, ProcessCoord(pipe=16, data=3, model=0): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=1, model=0): 69, ProcessCoord(pipe=17, data=2, model=0): 70, ProcessCoord(pipe=17, data=3, model=0): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=1, model=0): 73, ProcessCoord(pipe=18, data=2, model=0): 74, ProcessCoord(pipe=18, data=3, model=0): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=1, model=0): 77, ProcessCoord(pipe=19, data=2, model=0): 78, ProcessCoord(pipe=19, data=3, model=0): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=1, model=0): 81, ProcessCoord(pipe=20, data=2, model=0): 82, ProcessCoord(pipe=20, data=3, model=0): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=1, model=0): 85, ProcessCoord(pipe=21, data=2, model=0): 86, ProcessCoord(pipe=21, data=3, model=0): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=1, model=0): 89, ProcessCoord(pipe=22, data=2, model=0): 90, ProcessCoord(pipe=22, data=3, model=0): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=1, model=0): 93, ProcessCoord(pipe=23, data=2, model=0): 94, ProcessCoord(pipe=23, data=3, model=0): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=1, model=0): 97, ProcessCoord(pipe=24, data=2, model=0): 98, ProcessCoord(pipe=24, data=3, model=0): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=1, model=0): 101, ProcessCoord(pipe=25, data=2, model=0): 102, ProcessCoord(pipe=25, data=3, model=0): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=1, model=0): 105, ProcessCoord(pipe=26, data=2, model=0): 106, ProcessCoord(pipe=26, data=3, model=0): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=1, model=0): 109, ProcessCoord(pipe=27, data=2, model=0): 110, ProcessCoord(pipe=27, data=3, model=0): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=1, model=0): 113, ProcessCoord(pipe=28, data=2, model=0): 114, ProcessCoord(pipe=28, data=3, model=0): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=1, model=0): 117, ProcessCoord(pipe=29, data=2, model=0): 118, ProcessCoord(pipe=29, data=3, model=0): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=1, model=0): 121, ProcessCoord(pipe=30, data=2, model=0): 122, ProcessCoord(pipe=30, data=3, model=0): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=1, model=0): 125, ProcessCoord(pipe=31, data=2, model=0): 126, ProcessCoord(pipe=31, data=3, model=0): 127, ProcessCoord(pipe=32, data=0, model=0): 128, ProcessCoord(pipe=32, data=1, model=0): 129, ProcessCoord(pipe=32, data=2, model=0): 130, ProcessCoord(pipe=32, data=3, model=0): 131, ProcessCoord(pipe=33, data=0, model=0): 132, ProcessCoord(pipe=33, data=1, model=0): 133, ProcessCoord(pipe=33, data=2, model=0): 134, ProcessCoord(pipe=33, data=3, model=0): 135, ProcessCoord(pipe=34, data=0, model=0): 136, ProcessCoord(pipe=34, data=1, model=0): 137, ProcessCoord(pipe=34, data=2, model=0): 138, ProcessCoord(pipe=34, data=3, model=0): 139, ProcessCoord(pipe=35, data=0, model=0): 140, ProcessCoord(pipe=35, data=1, model=0): 141, ProcessCoord(pipe=35, data=2, model=0): 142, ProcessCoord(pipe=35, data=3, model=0): 143, ProcessCoord(pipe=36, data=0, model=0): 144, ProcessCoord(pipe=36, data=1, model=0): 145, ProcessCoord(pipe=36, data=2, model=0): 146, ProcessCoord(pipe=36, data=3, model=0): 147, ProcessCoord(pipe=37, data=0, model=0): 148, ProcessCoord(pipe=37, data=1, model=0): 149, ProcessCoord(pipe=37, data=2, model=0): 150, ProcessCoord(pipe=37, data=3, model=0): 151, ProcessCoord(pipe=38, data=0, model=0): 152, ProcessCoord(pipe=38, data=1, model=0): 153, ProcessCoord(pipe=38, data=2, model=0): 154, ProcessCoord(pipe=38, data=3, model=0): 155, ProcessCoord(pipe=39, data=0, model=0): 156, ProcessCoord(pipe=39, data=1, model=0): 157, ProcessCoord(pipe=39, data=2, model=0): 158, ProcessCoord(pipe=39, data=3, model=0): 159, ProcessCoord(pipe=40, data=0, model=0): 160, ProcessCoord(pipe=40, data=1, model=0): 161, ProcessCoord(pipe=40, data=2, model=0): 162, ProcessCoord(pipe=40, data=3, model=0): 163, ProcessCoord(pipe=41, data=0, model=0): 164, ProcessCoord(pipe=41, data=1, model=0): 165, ProcessCoord(pipe=41, data=2, model=0): 166, ProcessCoord(pipe=41, data=3, model=0): 167, ProcessCoord(pipe=42, data=0, model=0): 168, ProcessCoord(pipe=42, data=1, model=0): 169, ProcessCoord(pipe=42, data=2, model=0): 170, ProcessCoord(pipe=42, data=3, model=0): 171, ProcessCoord(pipe=43, data=0, model=0): 172, ProcessCoord(pipe=43, data=1, model=0): 173, ProcessCoord(pipe=43, data=2, model=0): 174, ProcessCoord(pipe=43, data=3, model=0): 175, ProcessCoord(pipe=44, data=0, model=0): 176, ProcessCoord(pipe=44, data=1, model=0): 177, ProcessCoord(pipe=44, data=2, model=0): 178, ProcessCoord(pipe=44, data=3, model=0): 179, ProcessCoord(pipe=45, data=0, model=0): 180, ProcessCoord(pipe=45, data=1, model=0): 181, ProcessCoord(pipe=45, data=2, model=0): 182, ProcessCoord(pipe=45, data=3, model=0): 183, ProcessCoord(pipe=46, data=0, model=0): 184, ProcessCoord(pipe=46, data=1, model=0): 185, ProcessCoord(pipe=46, data=2, model=0): 186, ProcessCoord(pipe=46, data=3, model=0): 187, ProcessCoord(pipe=47, data=0, model=0): 188, ProcessCoord(pipe=47, data=1, model=0): 189, ProcessCoord(pipe=47, data=2, model=0): 190, ProcessCoord(pipe=47, data=3, model=0): 191, ProcessCoord(pipe=48, data=0, model=0): 192, ProcessCoord(pipe=48, data=1, model=0): 193, ProcessCoord(pipe=48, data=2, model=0): 194, ProcessCoord(pipe=48, data=3, model=0): 195, ProcessCoord(pipe=49, data=0, model=0): 196, ProcessCoord(pipe=49, data=1, model=0): 197, ProcessCoord(pipe=49, data=2, model=0): 198, ProcessCoord(pipe=49, data=3, model=0): 199, ProcessCoord(pipe=50, data=0, model=0): 200, ProcessCoord(pipe=50, data=1, model=0): 201, ProcessCoord(pipe=50, data=2, model=0): 202, ProcessCoord(pipe=50, data=3, model=0): 203, ProcessCoord(pipe=51, data=0, model=0): 204, ProcessCoord(pipe=51, data=1, model=0): 205, ProcessCoord(pipe=51, data=2, model=0): 206, ProcessCoord(pipe=51, data=3, model=0): 207, ProcessCoord(pipe=52, data=0, model=0): 208, ProcessCoord(pipe=52, data=1, model=0): 209, ProcessCoord(pipe=52, data=2, model=0): 210, ProcessCoord(pipe=52, data=3, model=0): 211, ProcessCoord(pipe=53, data=0, model=0): 212, ProcessCoord(pipe=53, data=1, model=0): 213, ProcessCoord(pipe=53, data=2, model=0): 214, ProcessCoord(pipe=53, data=3, model=0): 215, ProcessCoord(pipe=54, data=0, model=0): 216, ProcessCoord(pipe=54, data=1, model=0): 217, ProcessCoord(pipe=54, data=2, model=0): 218, ProcessCoord(pipe=54, data=3, model=0): 219, ProcessCoord(pipe=55, data=0, model=0): 220, ProcessCoord(pipe=55, data=1, model=0): 221, ProcessCoord(pipe=55, data=2, model=0): 222, ProcessCoord(pipe=55, data=3, model=0): 223, ProcessCoord(pipe=56, data=0, model=0): 224, ProcessCoord(pipe=56, data=1, model=0): 225, ProcessCoord(pipe=56, data=2, model=0): 226, ProcessCoord(pipe=56, data=3, model=0): 227, ProcessCoord(pipe=57, data=0, model=0): 228, ProcessCoord(pipe=57, data=1, model=0): 229, ProcessCoord(pipe=57, data=2, model=0): 230, ProcessCoord(pipe=57, data=3, model=0): 231, ProcessCoord(pipe=58, data=0, model=0): 232, ProcessCoord(pipe=58, data=1, model=0): 233, ProcessCoord(pipe=58, data=2, model=0): 234, ProcessCoord(pipe=58, data=3, model=0): 235, ProcessCoord(pipe=59, data=0, model=0): 236, ProcessCoord(pipe=59, data=1, model=0): 237, ProcessCoord(pipe=59, data=2, model=0): 238, ProcessCoord(pipe=59, data=3, model=0): 239, ProcessCoord(pipe=60, data=0, model=0): 240, ProcessCoord(pipe=60, data=1, model=0): 241, ProcessCoord(pipe=60, data=2, model=0): 242, ProcessCoord(pipe=60, data=3, model=0): 243, ProcessCoord(pipe=61, data=0, model=0): 244, ProcessCoord(pipe=61, data=1, model=0): 245, ProcessCoord(pipe=61, data=2, model=0): 246, ProcessCoord(pipe=61, data=3, model=0): 247, ProcessCoord(pipe=62, data=0, model=0): 248, ProcessCoord(pipe=62, data=1, model=0): 249, ProcessCoord(pipe=62, data=2, model=0): 250, ProcessCoord(pipe=62, data=3, model=0): 251, ProcessCoord(pipe=63, data=0, model=0): 252, ProcessCoord(pipe=63, data=1, model=0): 253, ProcessCoord(pipe=63, data=2, model=0): 254, ProcessCoord(pipe=63, data=3, model=0): 255, ProcessCoord(pipe=64, data=0, model=0): 256, ProcessCoord(pipe=64, data=1, model=0): 257, ProcessCoord(pipe=64, data=2, model=0): 258, ProcessCoord(pipe=64, data=3, model=0): 259, ProcessCoord(pipe=65, data=0, model=0): 260, ProcessCoord(pipe=65, data=1, model=0): 261, ProcessCoord(pipe=65, data=2, model=0): 262, ProcessCoord(pipe=65, data=3, model=0): 263, ProcessCoord(pipe=66, data=0, model=0): 264, ProcessCoord(pipe=66, data=1, model=0): 265, ProcessCoord(pipe=66, data=2, model=0): 266, ProcessCoord(pipe=66, data=3, model=0): 267, ProcessCoord(pipe=67, data=0, model=0): 268, ProcessCoord(pipe=67, data=1, model=0): 269, ProcessCoord(pipe=67, data=2, model=0): 270, ProcessCoord(pipe=67, data=3, model=0): 271, ProcessCoord(pipe=68, data=0, model=0): 272, ProcessCoord(pipe=68, data=1, model=0): 273, ProcessCoord(pipe=68, data=2, model=0): 274, ProcessCoord(pipe=68, data=3, model=0): 275, ProcessCoord(pipe=69, data=0, model=0): 276, ProcessCoord(pipe=69, data=1, model=0): 277, ProcessCoord(pipe=69, data=2, model=0): 278, ProcessCoord(pipe=69, data=3, model=0): 279, ProcessCoord(pipe=70, data=0, model=0): 280, ProcessCoord(pipe=70, data=1, model=0): 281, ProcessCoord(pipe=70, data=2, model=0): 282, ProcessCoord(pipe=70, data=3, model=0): 283, ProcessCoord(pipe=71, data=0, model=0): 284, ProcessCoord(pipe=71, data=1, model=0): 285, ProcessCoord(pipe=71, data=2, model=0): 286, ProcessCoord(pipe=71, data=3, model=0): 287} [default0]:[2022-09-05 14:21:47,276] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer|embedding [default0]:stage=0 layers=3 [default0]: 0: _to_float16 [default0]: 1: EmbeddingPipe [default0]: 2: [default0]:stage=1 layers=1 [default0]: 3: ParallelTransformerLayerPipe [default0]:stage=2 layers=1 [default0]: 4: ParallelTransformerLayerPipe [default0]:stage=3 layers=1 [default0]: 5: ParallelTransformerLayerPipe [default0]:stage=4 layers=1 [default0]: 6: ParallelTransformerLayerPipe [default0]:stage=5 layers=1 [default0]: 7: ParallelTransformerLayerPipe [default0]:stage=6 layers=1 [default0]: 8: ParallelTransformerLayerPipe [default0]:stage=7 layers=1 [default0]: 9: ParallelTransformerLayerPipe [default0]:stage=8 layers=1 [default0]: 10: ParallelTransformerLayerPipe [default0]:stage=9 layers=1 [default0]: 11: ParallelTransformerLayerPipe [default0]:stage=10 layers=1 [default0]: 12: ParallelTransformerLayerPipe [default0]:stage=11 layers=1 [default0]: 13: ParallelTransformerLayerPipe [default0]:stage=12 layers=1 [default0]: 14: ParallelTransformerLayerPipe [default0]:stage=13 layers=1 [default0]: 15: ParallelTransformerLayerPipe [default0]:stage=14 layers=1 [default0]: 16: ParallelTransformerLayerPipe [default0]:stage=15 layers=1 [default0]: 17: ParallelTransformerLayerPipe [default0]:stage=16 layers=1 [default0]: 18: ParallelTransformerLayerPipe [default0]:stage=17 layers=1 [default0]: 19: ParallelTransformerLayerPipe [default0]:stage=18 layers=1 [default0]: 20: ParallelTransformerLayerPipe [default0]:stage=19 layers=1 [default0]: 21: ParallelTransformerLayerPipe [default0]:stage=20 layers=1 [default0]: 22: ParallelTransformerLayerPipe [default0]:stage=21 layers=1 [default0]: 23: ParallelTransformerLayerPipe [default0]:stage=22 layers=1 [default0]: 24: ParallelTransformerLayerPipe [default0]:stage=23 layers=1 [default0]: 25: ParallelTransformerLayerPipe [default0]:stage=24 layers=1 [default0]: 26: ParallelTransformerLayerPipe [default0]:stage=25 layers=1 [default0]: 27: ParallelTransformerLayerPipe [default0]:stage=26 layers=1 [default0]: 28: ParallelTransformerLayerPipe [default0]:stage=27 layers=1 [default0]: 29: ParallelTransformerLayerPipe [default0]:stage=28 layers=1 [default0]: 30: ParallelTransformerLayerPipe [default0]:stage=29 layers=1 [default0]: 31: ParallelTransformerLayerPipe [default0]:stage=30 layers=1 [default0]: 32: ParallelTransformerLayerPipe [default0]:stage=31 layers=1 [default0]: 33: ParallelTransformerLayerPipe [default0]:stage=32 layers=1 [default0]: 34: ParallelTransformerLayerPipe [default0]:stage=33 layers=1 [default0]: 35: ParallelTransformerLayerPipe [default0]:stage=34 layers=1 [default0]: 36: ParallelTransformerLayerPipe [default0]:stage=35 layers=1 [default0]: 37: ParallelTransformerLayerPipe [default0]:stage=36 layers=1 [default0]: 38: ParallelTransformerLayerPipe [default0]:stage=37 layers=1 [default0]: 39: ParallelTransformerLayerPipe [default0]:stage=38 layers=1 [default0]: 40: ParallelTransformerLayerPipe [default0]:stage=39 layers=1 [default0]: 41: ParallelTransformerLayerPipe [default0]:stage=40 layers=1 [default0]: 42: ParallelTransformerLayerPipe [default0]:stage=41 layers=1 [default0]: 43: ParallelTransformerLayerPipe [default0]:stage=42 layers=1 [default0]: 44: ParallelTransformerLayerPipe [default0]:stage=43 layers=1 [default0]: 45: ParallelTransformerLayerPipe [default0]:stage=44 layers=1 [default0]: 46: ParallelTransformerLayerPipe [default0]:stage=45 layers=1 [default0]: 47: ParallelTransformerLayerPipe [default0]:stage=46 layers=1 [default0]: 48: ParallelTransformerLayerPipe [default0]:stage=47 layers=1 [default0]: 49: ParallelTransformerLayerPipe [default0]:stage=48 layers=1 [default0]: 50: ParallelTransformerLayerPipe [default0]:stage=49 layers=1 [default0]: 51: ParallelTransformerLayerPipe [default0]:stage=50 layers=1 [default0]: 52: ParallelTransformerLayerPipe [default0]:stage=51 layers=1 [default0]: 53: ParallelTransformerLayerPipe [default0]:stage=52 layers=1 [default0]: 54: ParallelTransformerLayerPipe [default0]:stage=53 layers=1 [default0]: 55: ParallelTransformerLayerPipe [default0]:stage=54 layers=1 [default0]: 56: ParallelTransformerLayerPipe [default0]:stage=55 layers=1 [default0]: 57: ParallelTransformerLayerPipe [default0]:stage=56 layers=1 [default0]: 58: ParallelTransformerLayerPipe [default0]:stage=57 layers=1 [default0]: 59: ParallelTransformerLayerPipe [default0]:stage=58 layers=1 [default0]: 60: ParallelTransformerLayerPipe [default0]:stage=59 layers=1 [default0]: 61: ParallelTransformerLayerPipe [default0]:stage=60 layers=1 [default0]: 62: ParallelTransformerLayerPipe [default0]:stage=61 layers=1 [default0]: 63: ParallelTransformerLayerPipe [default0]:stage=62 layers=1 [default0]: 64: ParallelTransformerLayerPipe [default0]:stage=63 layers=1 [default0]: 65: ParallelTransformerLayerPipe [default0]:stage=64 layers=1 [default0]: 66: ParallelTransformerLayerPipe [default0]:stage=65 layers=1 [default0]: 67: ParallelTransformerLayerPipe [default0]:stage=66 layers=1 [default0]: 68: ParallelTransformerLayerPipe [default0]:stage=67 layers=1 [default0]: 69: ParallelTransformerLayerPipe [default0]:stage=68 layers=1 [default0]: 70: ParallelTransformerLayerPipe [default0]:stage=69 layers=1 [default0]: 71: ParallelTransformerLayerPipe [default0]:stage=70 layers=3 [default0]: 72: ParallelTransformerLayerPipe [default0]: 73: undo [default0]: 74: MixedFusedLayerNorm [default0]:stage=71 layers=2 [default0]: 75: EmbeddingPipe [default0]: 76: float16_to_fp32 [default0]: loss: CrossEntropy [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default3]:Building extension module utils... [default3]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default3]:ninja: no work to do. [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.27982521057128906 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.08896923065185547 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-05 14:21:49,011] [INFO] [utils.py:827:see_memory_usage] After Building Model [default0]:[2022-09-05 14:21:49,011] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-05 14:21:49,011] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.48 GB, percent = 7.2% [default0]:setting training iterations to 3100 [default0]:> learning rate decay style: constant [default0]:DeepSpeed is enabled. [default0]:[2022-09-05 14:21:49,012] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.1+8b2a6371, git-hash=8b2a6371, git-branch=master [default1]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default1]:Building extension module utils... [default1]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20621681213378906 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2061750888824463 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20606732368469238 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20616364479064941 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20590710639953613 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20689988136291504 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2059328556060791 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20648431777954102 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20624995231628418 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20596933364868164 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2069101333618164 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20605826377868652 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2069990634918213 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20617055892944336 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20543789863586426 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20722675323486328 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2066962718963623 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2077772617340088 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20724058151245117 seconds [default3]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20691490173339844 seconds [default3]:Time to load utils op: 0.20646095275878906 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20672941207885742 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20644354820251465 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20710420608520508 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21024727821350098 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20646119117736816 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20644116401672363 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20281696319580078 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20301103591918945 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21024155616760254 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.207183837890625 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21025705337524414 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2024848461151123 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20531082153320312 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2055363655090332 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20300865173339844 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20233917236328125 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20481562614440918 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2152416706085205 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2160353660583496 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20435523986816406 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20567822456359863 seconds [default5]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20425653457641602 seconds [default5]:Time to load utils op: 0.20575499534606934 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21026277542114258 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20685315132141113 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20245051383972168 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20246553421020508 seconds [default3]:Loading extension module utils... [default0]:Loading extension module utils... [default2]:Loading extension module utils... [default1]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21466398239135742 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20244789123535156 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20225906372070312 seconds [default1]:Loading extension module utils... [default0]:Loading extension module utils... [default5]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.7329585552215576 seconds [default1]:Loading extension module utils... [default3]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21219682693481445 seconds [default2]:Loading extension module utils... [default1]:Loading extension module utils... [default0]:Loading extension module utils... [default3]:Loading extension module utils... [default2]:Loading extension module utils... [default3]:Time to load utils op: 0.37554359436035156 seconds [default2]:Time to load utils op: 0.3757059574127197 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3081369400024414 seconds [default3]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20458245277404785 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21435284614562988 seconds [default2]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20430684089660645 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20267796516418457 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2140350341796875 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.219085693359375 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21503043174743652 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2149510383605957 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3568904399871826 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21180200576782227 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.35690760612487793 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21910977363586426 seconds [default6]:Loading extension module utils... [default4]:Loading extension module utils... [default6]:Time to load utils op: 0.2190568447113037 seconds [default2]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21907329559326172 seconds [default4]:Time to load utils op: 0.21908926963806152 seconds [default2]:Time to load utils op: 0.21460700035095215 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2190709114074707 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21431422233581543 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.35689401626586914 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21182489395141602 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21177411079406738 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2117900848388672 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21181631088256836 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.21449518203735352 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21450567245483398 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2057499885559082 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21438169479370117 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.21498656272888184 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2050936222076416 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21326327323913574 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2150740623474121 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30713367462158203 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.21204209327697754 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3142263889312744 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21178865432739258 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2117781639099121 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21426129341125488 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2142627239227295 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2143716812133789 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21438336372375488 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21459269523620605 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21183228492736816 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3140106201171875 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.31424927711486816 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21219897270202637 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3568873405456543 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21430540084838867 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.4064483642578125 seconds [default1]:ninja: no work to do. [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.34955334663391113 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005295276641845703 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3364126682281494 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3366677761077881 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.4062333106994629 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.4062831401824951 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3372225761413574 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2142646312713623 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20524072647094727 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2150120735168457 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2052459716796875 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.31397390365600586 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30702924728393555 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20554113388061523 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30257558822631836 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.7931511402130127 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.7926502227783203 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.7930154800415039 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3380885124206543 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.7928273677825928 seconds [default1]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3857426643371582 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3857593536376953 seconds [default1]:Time to load utils op: 0.38573479652404785 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.38573765754699707 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.337998628616333 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3029320240020752 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3380141258239746 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.33806371688842773 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3027939796447754 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30259108543395996 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.33022427558898926 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3029193878173828 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3377995491027832 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.33022522926330566 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30280041694641113 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30277347564697266 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30250000953674316 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.32092857360839844 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.32126832008361816 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3083775043487549 seconds [default7]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.32117629051208496 seconds [default7]:Time to load utils op: 0.32072997093200684 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3083953857421875 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.37645626068115234 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3085062503814697 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3027355670928955 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30835461616516113 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3299574851989746 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.31253528594970703 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3125293254852295 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3125624656677246 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3022584915161133 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.33005595207214355 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3026411533355713 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3025805950164795 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004286766052246094 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3082718849182129 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00047469139099121094 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3026919364929199 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007777214050292969 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3026611804962158 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3026707172393799 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004773139953613281 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30242061614990234 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3026864528656006 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3027174472808838 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3129570484161377 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.31282472610473633 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3028852939605713 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.31252098083496094 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3761024475097656 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.306133508682251 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005660057067871094 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30856919288635254 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30272865295410156 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.000522613525390625 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30613064765930176 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30268287658691406 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.30614686012268066 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.30248498916625977 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.305983304977417 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30599474906921387 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30602073669433594 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3060135841369629 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30283665657043457 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.302670955657959 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.31217312812805176 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005044937133789062 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007450580596923828 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.31215715408325195 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.30268096923828125 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.30612874031066895 seconds [default4]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3024764060974121 seconds [default4]:Time to load utils op: 0.3238189220428467 seconds [default5]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3239271640777588 seconds [default5]:Time to load utils op: 0.323866605758667 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.32355165481567383 seconds [default7]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3027341365814209 seconds [default7]:Time to load utils op: 0.30266857147216797 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3577558994293213 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30271291732788086 seconds [default3]:Time to load utils op: 0.34662675857543945 seconds [default0]:Time to load utils op: 0.34662485122680664 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30506086349487305 seconds [default2]:Time to load utils op: 0.34662580490112305 seconds [default1]:Time to load utils op: 0.34661388397216797 seconds [default0]:Time to load utils op: 0.7329537868499756 seconds [default1]:Time to load utils op: 0.7328369617462158 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3024442195892334 seconds [default5]:Time to load utils op: 0.3079802989959717 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3075571060180664 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3028247356414795 seconds [default1]:Time to load utils op: 0.35775256156921387 seconds [default3]:Time to load utils op: 0.35770225524902344 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3025319576263428 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.001718282699584961 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.30272960662841797 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3028690814971924 seconds [default2]:Time to load utils op: 0.35772085189819336 seconds [default1]:Time to load utils op: 0.36669039726257324 seconds [default6]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30243635177612305 seconds [default6]:Time to load utils op: 0.3131873607635498 seconds [default0]:Time to load utils op: 0.36670446395874023 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30215954780578613 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.302509069442749 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3075532913208008 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.30242490768432617 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3022170066833496 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3075582981109619 seconds [default3]:Time to load utils op: 0.3667006492614746 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3131141662597656 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.30278563499450684 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3026413917541504 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006697177886962891 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004112720489501953 seconds [default2]:Time to load utils op: 0.36668920516967773 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3023946285247803 seconds [default3]:Time to load utils op: 0.00044226646423339844 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3075718879699707 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30246448516845703 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3021965026855469 seconds [default1]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3085649013519287 seconds [default1]:Time to load utils op: 0.3082756996154785 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3772897720336914 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3025643825531006 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30239009857177734 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00047135353088378906 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.302748441696167 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30471158027648926 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3050661087036133 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3024897575378418 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005929470062255859 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3131136894226074 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3046987056732178 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.37734246253967285 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3772599697113037 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3051273822784424 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3773350715637207 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.30828309059143066 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30512142181396484 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.30511999130249023 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.308215856552124 seconds [default5]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.30277156829833984 seconds [default5]:Time to load utils op: 0.302689790725708 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3025693893432617 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00039958953857421875 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.307326078414917 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3072783946990967 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3052060604095459 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0003218650817871094 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3056519031524658 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0018160343170166016 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.30254578590393066 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.30272769927978516 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3024420738220215 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004856586456298828 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006289482116699219 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30263853073120117 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3027048110961914 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.30286407470703125 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.305483341217041 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30245208740234375 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3024933338165283 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006234645843505859 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3131983280181885 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30269956588745117 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3025400638580322 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30228447914123535 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.302295446395874 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3051185607910156 seconds [default4]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3094151020050049 seconds [default4]:Time to load utils op: 0.309420108795166 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30266404151916504 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3094179630279541 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.30511045455932617 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30942559242248535 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00086212158203125 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004951953887939453 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007383823394775391 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.000701904296875 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006687641143798828 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005128383636474609 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007939338684082031 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007734298706054688 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004856586456298828 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004296302795410156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004417896270751953 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004870891571044922 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007989406585693359 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005223751068115234 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006816387176513672 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006194114685058594 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006861686706542969 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006248950958251953 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007746219635009766 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007660388946533203 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004622936248779297 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006837844848632812 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00077056884765625 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005285739898681641 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005648136138916016 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005283355712890625 seconds [default0]:Time to load utils op: 0.0004942417144775391 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006079673767089844 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007956027984619141 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006480216979980469 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0010099411010742188 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008678436279296875 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005288124084472656 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004787445068359375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008540153503417969 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006716251373291016 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0010573863983154297 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.001033782958984375 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.000995635986328125 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004680156707763672 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0009381771087646484 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008265972137451172 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0009028911590576172 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006539821624755859 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0010023117065429688 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.001024007797241211 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004858970642089844 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006632804870605469 seconds [default4]:Time to load utils op: 0.0008304119110107422 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009095668792724609 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00044918060302734375 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005078315734863281 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007059574127197266 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006117820739746094 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007121562957763672 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.001026153564453125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default2]:Time to load utils op: 0.000850677490234375 seconds [default6]:Time to load utils op: 0.00046324729919433594 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007271766662597656 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007655620574951172 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007321834564208984 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00047278404235839844 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005624294281005859 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005402565002441406 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006849765777587891 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.000537872314453125 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005497932434082031 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0014615058898925781 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004189014434814453 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005438327789306641 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006756782531738281 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008502006530761719 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004949569702148438 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005555152893066406 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005211830139160156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007777214050292969 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008842945098876953 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004584789276123047 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Time to load utils op: 0.0007281303405761719 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005891323089599609 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00051116943359375 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005638599395751953 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004887580871582031 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0003979206085205078 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004742145538330078 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0010044574737548828 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0009851455688476562 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.000522613525390625 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004553794860839844 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.000553131103515625 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0011649131774902344 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00043582916259765625 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006148815155029297 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0010039806365966797 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005919933319091797 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0003476142883300781 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004999637603759766 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default7]:Time to load utils op: 0.0007166862487792969 seconds [default6]:Time to load utils op: 0.0005192756652832031 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004076957702636719 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00037169456481933594 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008590221405029297 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000530242919921875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0014824867248535156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.000457763671875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0010182857513427734 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0010280609130859375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004782676696777344 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0009167194366455078 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004715919494628906 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0010325908660888672 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005192756652832031 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00041294097900390625 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default0]:Time to load utils op: 0.0006902217864990234 seconds [default1]:Time to load utils op: 0.0007450580596923828 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00046062469482421875 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008518695831298828 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007150173187255859 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006635189056396484 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006759166717529297 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0009486675262451172 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006504058837890625 seconds [default3]:Time to load utils op: 0.0009250640869140625 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008347034454345703 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007686614990234375 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Time to load utils op: 0.0005946159362792969 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007572174072265625 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006742477416992188 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006241798400878906 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0008795261383056641 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006768703460693359 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000690460205078125 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006198883056640625 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008451938629150391 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0009014606475830078 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0009791851043701172 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005331039428710938 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004911422729492188 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006101131439208984 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00040268898010253906 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006768703460693359 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006759166717529297 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006647109985351562 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006718635559082031 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004563331604003906 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004961490631103516 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005431175231933594 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000507354736328125 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004851818084716797 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005602836608886719 seconds [default1]:Time to load utils op: 0.0006721019744873047 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006537437438964844 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Time to load utils op: 0.0005543231964111328 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006384849548339844 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00036787986755371094 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00042819976806640625 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004837512969970703 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004749298095703125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006668567657470703 seconds [default6]:Time to load utils op: 0.0005197525024414062 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006265640258789062 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005033016204833984 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00048279762268066406 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006721019744873047 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0009598731994628906 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005350112915039062 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005888938903808594 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005881786346435547 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005884170532226562 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005767345428466797 seconds [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005180835723876953 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004336833953857422 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.000362396240234375 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005853176116943359 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006296634674072266 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Time to load utils op: 0.0005359649658203125 seconds [default5]:Loading extension module utils... [default6]:Loading extension module utils... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.000415802001953125 seconds [default5]:Time to load utils op: 0.00046634674072265625 seconds [default6]:Time to load utils op: 0.0004930496215820312 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005474090576171875 seconds [default7]:Time to load utils op: 0.0004303455352783203 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007910728454589844 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008141994476318359 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004932880401611328 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006999969482421875 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006966590881347656 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.000629425048828125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006875991821289062 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005297660827636719 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Time to load utils op: 0.0004830360412597656 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005786418914794922 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005548000335693359 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Time to load utils op: 0.00044727325439453125 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0009672641754150391 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0012247562408447266 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0011675357818603516 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0011379718780517578 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008687973022460938 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00048828125 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00038933753967285156 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005486011505126953 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00042438507080078125 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006999969482421875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.000942230224609375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004947185516357422 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006072521209716797 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006818771362304688 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0012209415435791016 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004935264587402344 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0009360313415527344 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005450248718261719 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005829334259033203 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00041174888610839844 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005059242248535156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00047850608825683594 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005099773406982422 seconds [default6]:Time to load utils op: 0.0004360675811767578 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00045800209045410156 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005402565002441406 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006043910980224609 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006535053253173828 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005900859832763672 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006577968597412109 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004706382751464844 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006346702575683594 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007219314575195312 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007274150848388672 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006368160247802734 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000499725341796875 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005793571472167969 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005085468292236328 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00044465065002441406 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005609989166259766 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004885196685791016 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004048347473144531 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0011966228485107422 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008993148803710938 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008046627044677734 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006723403930664062 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00060272216796875 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0009799003601074219 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005872249603271484 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005304813385009766 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006687641143798828 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0014865398406982422 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0011224746704101562 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0012140274047851562 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006089210510253906 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008780956268310547 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00036835670471191406 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005414485931396484 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.000598907470703125 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006387233734130859 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007491111755371094 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006270408630371094 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005345344543457031 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default6]:Time to load utils op: 0.0007002353668212891 seconds [default3]:Time to load utils op: 0.0005605220794677734 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-05 14:21:49,792] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [default0]:[2022-09-05 14:21:49,793] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer [default0]:[2022-09-05 14:21:49,793] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer [default0]:[2022-09-05 14:21:49,793] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = {basic_optimizer.__class__.__name__} [default0]:[2022-09-05 14:21:49,793] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default4]:Building extension module utils... [default4]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:[2022-09-05 14:21:49,819] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer [default0]:[2022-09-05 14:21:49,819] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-05 14:21:49,819] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.64 GB, percent = 7.3% [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:ninja: no work to do. [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21355628967285156 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20993494987487793 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20951175689697266 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20924830436706543 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3042900562286377 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004489421844482422 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2022876739501953 seconds [default0]:[2022-09-05 14:21:50,046] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 [default0]:[2022-09-05 14:21:50,046] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-05 14:21:50,047] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.64 GB, percent = 7.3% [default0]:[2022-09-05 14:21:50,116] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 [default0]:[2022-09-05 14:21:50,116] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:21:50,116] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.64 GB, percent = 7.3% [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.30415821075439453 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3043551445007324 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.001750946044921875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0019273757934570312 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0003719329833984375 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0017066001892089844 seconds [default0]:[2022-09-05 14:21:50,139] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 [default0]:[2022-09-05 14:21:50,140] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:21:50,140] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.64 GB, percent = 7.3% [default0]:[2022-09-05 14:21:50,163] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 [default0]:[2022-09-05 14:21:50,163] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:21:50,164] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.64 GB, percent = 7.3% [default0]:[2022-09-05 14:21:50,186] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer [default0]:[2022-09-05 14:21:50,187] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:21:50,187] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.65 GB, percent = 7.3% [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00040221214294433594 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003333091735839844 seconds [default0]:[2022-09-05 14:21:50,245] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer [default0]:[2022-09-05 14:21:50,246] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-05 14:21:50,246] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.65 GB, percent = 7.3% [default0]:[2022-09-05 14:21:50,268] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer [default0]:[2022-09-05 14:21:50,268] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-05 14:21:50,269] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.65 GB, percent = 7.3% [default0]:[2022-09-05 14:21:50,269] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [default0]:[2022-09-05 14:21:50,269] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler [default0]:[2022-09-05 14:21:50,269] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [default0]:[2022-09-05 14:21:50,269] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05, 2e-05], mom=[(0.9, 0.95), (0.9, 0.95)] [default0]:[2022-09-05 14:21:50,269] [INFO] [config.py:987:print] DeepSpeedEngine configuration: [default0]:[2022-09-05 14:21:50,269] [INFO] [config.py:991:print] activation_checkpointing_config { [default0]: "partition_activations": false, [default0]: "contiguous_memory_optimization": false, [default0]: "cpu_checkpointing": false, [default0]: "number_checkpoints": null, [default0]: "synchronize_checkpoint_boundary": false, [default0]: "profile": false [default0]:} [default0]:[2022-09-05 14:21:50,269] [INFO] [config.py:991:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [default0]:[2022-09-05 14:21:50,269] [INFO] [config.py:991:print] amp_enabled .................. False [default0]:[2022-09-05 14:21:50,269] [INFO] [config.py:991:print] amp_params ................... False [default0]:[2022-09-05 14:21:50,269] [INFO] [config.py:991:print] autotuning_config ............ { [default0]: "enabled": false, [default0]: "start_step": null, [default0]: "end_step": null, [default0]: "metric_path": null, [default0]: "arg_mappings": null, [default0]: "metric": "throughput", [default0]: "model_info": null, [default0]: "results_dir": null, [default0]: "exps_dir": null, [default0]: "overwrite": true, [default0]: "fast": true, [default0]: "start_profile_step": 3, [default0]: "end_profile_step": 5, [default0]: "tuner_type": "gridsearch", [default0]: "tuner_early_stopping": 5, [default0]: "tuner_num_trials": 50, [default0]: "model_info_path": null, [default0]: "mp_size": 1, [default0]: "max_train_batch_size": null, [default0]: "min_train_batch_size": 1, [default0]: "max_train_micro_batch_size_per_gpu": 1.024000e+03, [default0]: "min_train_micro_batch_size_per_gpu": 1, [default0]: "num_tuning_micro_batch_sizes": 3 [default0]:} [default0]:[2022-09-05 14:21:50,269] [INFO] [config.py:991:print] bfloat16_enabled ............. True [default0]:[2022-09-05 14:21:50,269] [INFO] [config.py:991:print] checkpoint_tag_validation_enabled True [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] checkpoint_tag_validation_fail False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] comms_config ................. [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] communication_data_type ...... None [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] curriculum_enabled ........... False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] curriculum_params ............ False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] dataloader_drop_last ......... False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] disable_allgather ............ False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] dump_state ................... False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] dynamic_loss_scale_args ...... None [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] eigenvalue_enabled ........... False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] eigenvalue_gas_boundary_resolution 1 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] eigenvalue_layer_name ........ bert.encoder.layer [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] eigenvalue_layer_num ......... 0 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] eigenvalue_max_iter .......... 100 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] eigenvalue_stability ......... 1e-06 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] eigenvalue_tol ............... 0.01 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] eigenvalue_verbose ........... False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] elasticity_enabled ........... False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] flops_profiler_config ........ { [default0]: "enabled": false, [default0]: "profile_step": 1, [default0]: "module_depth": -1, [default0]: "top_modules": 1, [default0]: "detailed": true, [default0]: "output_file": null [default0]:} [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] fp16_auto_cast ............... None [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] fp16_enabled ................. False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] fp16_master_weights_and_gradients False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] global_rank .................. 0 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] gradient_accumulation_steps .. 512 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] gradient_clipping ............ 1.0 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] gradient_predivide_factor .... 1.0 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] initial_dynamic_scale ........ 1 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] load_universal_checkpoint .... True [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] loss_scale ................... 1.0 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] memory_breakdown ............. False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] monitor_config ............... [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] nebula_config ................ { [default0]: "enabled": false, [default0]: "persistent_storage_path": null, [default0]: "persistent_time_interval": 100, [default0]: "num_of_version_in_retention": 2, [default0]: "enable_nebula_load": true, [default0]: "load_path": null [default0]:} [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] optimizer_legacy_fusion ...... False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] optimizer_name ............... None [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] optimizer_params ............. None [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] pld_enabled .................. False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] pld_params ................... False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] prescale_gradients ........... False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] scheduler_name ............... None [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] scheduler_params ............. None [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] sparse_attention ............. None [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] sparse_gradients_enabled ..... False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] steps_per_print .............. 2000 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] train_batch_size ............. 2048 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] train_micro_batch_size_per_gpu 1 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] wall_clock_breakdown ......... False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] world_size ................... 4 [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] zero_allow_untested_optimizer False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [default0]:[2022-09-05 14:21:50,270] [INFO] [config.py:991:print] zero_enabled ................. False [default0]:[2022-09-05 14:21:50,271] [INFO] [config.py:991:print] zero_optimization_stage ...... 0 [default0]:[2022-09-05 14:21:50,271] [INFO] [config.py:976:print_user_config] json = { [default0]: "train_micro_batch_size_per_gpu": 1, [default0]: "train_batch_size": 2.048000e+03, [default0]: "gradient_clipping": 1.0, [default0]: "zero_optimization": { [default0]: "stage": 0 [default0]: }, [default0]: "bf16": { [default0]: "enabled": true [default0]: }, [default0]: "steps_per_print": 2.000000e+03, [default0]: "wall_clock_breakdown": false, [default0]: "checkpoint": { [default0]: "load_universal": true [default0]: } [default0]:} [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005092620849609375 seconds [default0]:[2022-09-05 14:21:50,271] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=512 micro_batch_size=1 [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=32 STAGE=8 LAYERS=1 [10, 11) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=168 STAGE=42 LAYERS=1 [44, 45) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=172 STAGE=43 LAYERS=1 [45, 46) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=192 STAGE=48 LAYERS=1 [50, 51) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=48 STAGE=12 LAYERS=1 [14, 15) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=24 STAGE=6 LAYERS=1 [8, 9) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=36 STAGE=9 LAYERS=1 [11, 12) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=196 STAGE=49 LAYERS=1 [51, 52) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,855] [INFO] [engine.py:145:__init__] RANK=256 STAGE=64 LAYERS=1 [66, 67) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=72 STAGE=18 LAYERS=1 [20, 21) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=28 STAGE=7 LAYERS=1 [9, 10) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=104 STAGE=26 LAYERS=1 [28, 29) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=88 STAGE=22 LAYERS=1 [24, 25) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=248 STAGE=62 LAYERS=1 [64, 65) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=76 STAGE=19 LAYERS=1 [21, 22) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=108 STAGE=27 LAYERS=1 [29, 30) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=252 STAGE=63 LAYERS=1 [65, 66) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=52 STAGE=13 LAYERS=1 [15, 16) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=8 STAGE=2 LAYERS=1 [4, 5) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=240 STAGE=60 LAYERS=1 [62, 63) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=16 STAGE=4 LAYERS=1 [6, 7) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=12 STAGE=3 LAYERS=1 [5, 6) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=92 STAGE=23 LAYERS=1 [25, 26) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=208 STAGE=52 LAYERS=1 [54, 55) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=184 STAGE=46 LAYERS=1 [48, 49) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=244 STAGE=61 LAYERS=1 [63, 64) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=68 STAGE=17 LAYERS=1 [19, 20) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=64 STAGE=16 LAYERS=1 [18, 19) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=224 STAGE=56 LAYERS=1 [58, 59) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=212 STAGE=53 LAYERS=1 [55, 56) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=200 STAGE=50 LAYERS=1 [52, 53) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=228 STAGE=57 LAYERS=1 [59, 60) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=216 STAGE=54 LAYERS=1 [56, 57) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=144 STAGE=36 LAYERS=1 [38, 39) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=96 STAGE=24 LAYERS=1 [26, 27) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=280 STAGE=70 LAYERS=3 [72, 75) STAGE_PARAMS=2466465792 (2466.466M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=284 STAGE=71 LAYERS=2 [75, 77) STAGE_PARAMS=3596615680 (3596.616M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,855] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=3 [0, 3) STAGE_PARAMS=3596644352 (3596.644M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=220 STAGE=55 LAYERS=1 [57, 58) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=136 STAGE=34 LAYERS=1 [36, 37) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=152 STAGE=38 LAYERS=1 [40, 41) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,855] [INFO] [engine.py:145:__init__] RANK=260 STAGE=65 LAYERS=1 [67, 68) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=60 STAGE=15 LAYERS=1 [17, 18) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=140 STAGE=35 LAYERS=1 [37, 38) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=148 STAGE=37 LAYERS=1 [39, 40) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=56 STAGE=14 LAYERS=1 [16, 17) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=128 STAGE=32 LAYERS=1 [34, 35) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=188 STAGE=47 LAYERS=1 [49, 50) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=204 STAGE=51 LAYERS=1 [53, 54) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=160 STAGE=40 LAYERS=1 [42, 43) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=40 STAGE=10 LAYERS=1 [12, 13) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=232 STAGE=58 LAYERS=1 [60, 61) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=164 STAGE=41 LAYERS=1 [43, 44) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=20 STAGE=5 LAYERS=1 [7, 8) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=100 STAGE=25 LAYERS=1 [27, 28) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,855] [INFO] [engine.py:145:__init__] RANK=132 STAGE=33 LAYERS=1 [35, 36) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=272 STAGE=68 LAYERS=1 [70, 71) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=124 STAGE=31 LAYERS=1 [33, 34) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=236 STAGE=59 LAYERS=1 [61, 62) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=80 STAGE=20 LAYERS=1 [22, 23) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=116 STAGE=29 LAYERS=1 [31, 32) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=44 STAGE=11 LAYERS=1 [13, 14) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=276 STAGE=69 LAYERS=1 [71, 72) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=84 STAGE=21 LAYERS=1 [23, 24) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=156 STAGE=39 LAYERS=1 [41, 42) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=180 STAGE=45 LAYERS=1 [47, 48) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=176 STAGE=44 LAYERS=1 [46, 47) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=112 STAGE=28 LAYERS=1 [30, 31) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=264 STAGE=66 LAYERS=1 [68, 69) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=120 STAGE=30 LAYERS=1 [32, 33) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,855] [INFO] [engine.py:145:__init__] RANK=4 STAGE=1 LAYERS=1 [3, 4) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:21:50,856] [INFO] [engine.py:145:__init__] RANK=268 STAGE=67 LAYERS=1 [69, 70) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default3]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:21:51,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:21:51,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:21:51,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:21:51,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:21:51,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:22:01,552] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 103 [default3]:[2022-09-05 14:22:02,539] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 155 [default2]:[2022-09-05 14:22:02,693] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 154 [default3]:[2022-09-05 14:22:03,162] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 195 [default3]:[2022-09-05 14:22:03,149] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 99 [default7]:[2022-09-05 14:22:03,668] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 271 [default6]:[2022-09-05 14:22:03,668] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 270 [default3]:[2022-09-05 14:22:04,120] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 67 [default3]:[2022-09-05 14:22:04,247] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 27 [default3]:[2022-09-05 14:22:04,414] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 227 [default3]:[2022-09-05 14:22:04,450] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 251 [default3]:[2022-09-05 14:22:04,601] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 123 [default7]:[2022-09-05 14:22:04,738] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 247 [default7]:[2022-09-05 14:22:04,811] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 263 [default7]:[2022-09-05 14:22:04,967] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 231 [default3]:[2022-09-05 14:22:05,083] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 243 [default6]:[2022-09-05 14:22:05,311] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 174 [default7]:[2022-09-05 14:22:05,311] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 175 [default3]:[2022-09-05 14:22:05,259] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 235 [default3]:[2022-09-05 14:22:05,370] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 51 [default7]:[2022-09-05 14:22:05,497] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 79 [default3]:[2022-09-05 14:22:05,723] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 83 [default3]:[2022-09-05 14:22:06,104] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 171 [default3]:[2022-09-05 14:22:06,062] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 203 [default2]:[2022-09-05 14:22:06,208] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 194 [default7]:[2022-09-05 14:22:06,137] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 15 [default6]:[2022-09-05 14:22:06,217] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 262 [default7]:[2022-09-05 14:22:06,284] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 63 [default7]:[2022-09-05 14:22:06,258] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 223 [default3]:[2022-09-05 14:22:06,450] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 11 [default2]:[2022-09-05 14:22:06,510] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 10 [default7]:[2022-09-05 14:22:06,479] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 159 [default7]:[2022-09-05 14:22:06,615] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 127 [default7]:[2022-09-05 14:22:06,576] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 183 [default1]:[2022-09-05 14:22:06,682] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 97 [default0]:[2022-09-05 14:22:06,677] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 96 [default7]:[2022-09-05 14:22:06,679] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 135 [default3]:[2022-09-05 14:22:06,733] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 179 [default2]:[2022-09-05 14:22:06,777] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 226 [default3]:[2022-09-05 14:22:06,786] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 147 [default5]:[2022-09-05 14:22:06,802] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 157 [default3]:[2022-09-05 14:22:06,741] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 19 [default4]:[2022-09-05 14:22:06,794] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 156 [default7]:[2022-09-05 14:22:06,988] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 55 [default7]:[2022-09-05 14:22:07,108] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 199 [default3]:[2022-09-05 14:22:07,131] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 187 [default2]:[2022-09-05 14:22:07,162] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 202 [default7]:[2022-09-05 14:22:07,322] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 207 [default3]:[2022-09-05 14:22:07,310] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 283 [default3]:[2022-09-05 14:22:07,348] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 131 [default4]:[2022-09-05 14:22:07,421] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 100 [default5]:[2022-09-05 14:22:07,427] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 101 [default7]:[2022-09-05 14:22:07,479] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 31 [default1]:[2022-09-05 14:22:07,446] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 225 [default0]:[2022-09-05 14:22:07,446] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 224 [default2]:[2022-09-05 14:22:07,491] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 162 [default3]:[2022-09-05 14:22:07,451] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 163 [default7]:[2022-09-05 14:22:07,482] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 87 [default7]:[2022-09-05 14:22:07,496] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 239 [default3]:[2022-09-05 14:22:07,610] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 139 [default5]:[2022-09-05 14:22:07,601] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 117 [default2]:[2022-09-05 14:22:07,694] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 50 [default2]:[2022-09-05 14:22:07,895] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 98 [default6]:[2022-09-05 14:22:07,924] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 222 [default7]:[2022-09-05 14:22:07,997] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 95 [default6]:[2022-09-05 14:22:07,996] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 230 [default6]:[2022-09-05 14:22:07,981] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 206 [default7]:[2022-09-05 14:22:07,995] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 47 [default3]:[2022-09-05 14:22:07,973] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 43 [default4]:[2022-09-05 14:22:08,032] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 116 [default6]:[2022-09-05 14:22:08,085] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 102 [default6]:[2022-09-05 14:22:08,095] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 86 [default5]:[2022-09-05 14:22:08,217] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 261 [default6]:[2022-09-05 14:22:08,154] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 134 [default3]:[2022-09-05 14:22:08,208] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 267 [default4]:[2022-09-05 14:22:08,248] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 260 [default6]:[2022-09-05 14:22:08,295] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 158 [default6]:[2022-09-05 14:22:08,312] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 62 [default7]:[2022-09-05 14:22:08,401] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 167 [default7]:[2022-09-05 14:22:08,432] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 23 [default1]:[2022-09-05 14:22:08,471] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 9 [default5]:[2022-09-05 14:22:08,461] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 213 [default4]:[2022-09-05 14:22:08,462] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 212 [default1]:[2022-09-05 14:22:08,495] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 153 [default0]:[2022-09-05 14:22:08,488] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 152 [default7]:[2022-09-05 14:22:08,453] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 151 [default2]:[2022-09-05 14:22:08,539] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 26 [default6]:[2022-09-05 14:22:08,535] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 78 [default5]:[2022-09-05 14:22:08,708] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 13 [default0]:[2022-09-05 14:22:08,634] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 8 [default4]:[2022-09-05 14:22:08,702] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 12 [default2]:[2022-09-05 14:22:08,719] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 250 [default3]:[2022-09-05 14:22:08,726] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 219 [default2]:[2022-09-05 14:22:08,728] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 218 [default2]:[2022-09-05 14:22:08,678] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 82 [default1]:[2022-09-05 14:22:08,802] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 249 [default0]:[2022-09-05 14:22:08,794] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 248 [default5]:[2022-09-05 14:22:08,813] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 61 [default7]:[2022-09-05 14:22:08,751] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 191 [default6]:[2022-09-05 14:22:08,799] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 118 [default7]:[2022-09-05 14:22:08,874] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 215 [default4]:[2022-09-05 14:22:08,861] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 60 [default2]:[2022-09-05 14:22:08,855] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 258 [default3]:[2022-09-05 14:22:08,854] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 259 [default6]:[2022-09-05 14:22:08,904] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 182 [default6]:[2022-09-05 14:22:08,951] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 126 [default2]:[2022-09-05 14:22:08,996] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 122 [default6]:[2022-09-05 14:22:08,981] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 190 [default2]:[2022-09-05 14:22:08,988] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 210 [default7]:[2022-09-05 14:22:09,049] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 143 [default2]:[2022-09-05 14:22:09,111] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 90 [default3]:[2022-09-05 14:22:09,035] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 107 [default3]:[2022-09-05 14:22:09,079] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 91 [default6]:[2022-09-05 14:22:09,043] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 14 [default7]:[2022-09-05 14:22:09,046] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 255 [default7]:[2022-09-05 14:22:09,130] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 119 [default6]:[2022-09-05 14:22:09,186] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 38 [default6]:[2022-09-05 14:22:09,184] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 30 [default7]:[2022-09-05 14:22:09,181] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 39 [default4]:[2022-09-05 14:22:09,149] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 76 [default5]:[2022-09-05 14:22:09,169] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 229 [default4]:[2022-09-05 14:22:09,170] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 228 [default6]:[2022-09-05 14:22:09,223] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 22 [default5]:[2022-09-05 14:22:09,146] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 77 [default3]:[2022-09-05 14:22:09,254] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 35 [default2]:[2022-09-05 14:22:09,319] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 34 [default2]:[2022-09-05 14:22:09,331] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 66 [default1]:[2022-09-05 14:22:09,314] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 201 [default2]:[2022-09-05 14:22:09,287] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 146 [default3]:[2022-09-05 14:22:09,272] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 59 [default3]:[2022-09-05 14:22:09,289] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 115 [default4]:[2022-09-05 14:22:09,248] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 124 [default2]:[2022-09-05 14:22:09,340] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 178 [default5]:[2022-09-05 14:22:09,251] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 125 [default5]:[2022-09-05 14:22:09,375] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 173 [default2]:[2022-09-05 14:22:09,359] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 170 [default4]:[2022-09-05 14:22:09,376] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 172 [default6]:[2022-09-05 14:22:09,430] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 54 [default0]:[2022-09-05 14:22:09,399] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 240 [default1]:[2022-09-05 14:22:09,393] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 241 [default0]:[2022-09-05 14:22:09,403] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 216 [default1]:[2022-09-05 14:22:09,383] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 217 [default2]:[2022-09-05 14:22:09,385] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 274 [default3]:[2022-09-05 14:22:09,383] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 275 [default4]:[2022-09-05 14:22:09,375] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 180 [default5]:[2022-09-05 14:22:09,373] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 181 [default6]:[2022-09-05 14:22:09,435] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 110 [default5]:[2022-09-05 14:22:09,471] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 109 [default4]:[2022-09-05 14:22:09,464] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 108 [default7]:[2022-09-05 14:22:09,458] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 111 [default0]:[2022-09-05 14:22:09,448] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 200 [default4]:[2022-09-05 14:22:09,476] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 220 [default6]:[2022-09-05 14:22:09,478] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 278 [default7]:[2022-09-05 14:22:09,451] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 279 [default5]:[2022-09-05 14:22:09,537] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 21 [default5]:[2022-09-05 14:22:09,458] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 221 [default1]:[2022-09-05 14:22:09,511] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 81 [default0]:[2022-09-05 14:22:09,505] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 80 [default2]:[2022-09-05 14:22:09,464] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 114 [default7]:[2022-09-05 14:22:09,516] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 7 [default3]:[2022-09-05 14:22:09,551] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 211 [default4]:[2022-09-05 14:22:09,601] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 148 [default4]:[2022-09-05 14:22:09,552] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 20 [default5]:[2022-09-05 14:22:09,605] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 149 [default6]:[2022-09-05 14:22:09,654] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 246 [default0]:[2022-09-05 14:22:09,668] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 160 [default2]:[2022-09-05 14:22:09,705] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 130 [default2]:[2022-09-05 14:22:09,735] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 42 [default5]:[2022-09-05 14:22:09,724] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 45 [default5]:[2022-09-05 14:22:09,750] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 29 [default1]:[2022-09-05 14:22:09,776] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 193 [default0]:[2022-09-05 14:22:09,757] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 192 [default6]:[2022-09-05 14:22:09,798] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 198 [default1]:[2022-09-05 14:22:09,737] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 89 [default0]:[2022-09-05 14:22:09,734] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 88 [default0]:[2022-09-05 14:22:09,762] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 104 [default1]:[2022-09-05 14:22:09,761] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 105 [default5]:[2022-09-05 14:22:09,825] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 37 [default4]:[2022-09-05 14:22:09,768] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 28 [default6]:[2022-09-05 14:22:09,782] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 214 [default0]:[2022-09-05 14:22:09,754] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 208 [default7]:[2022-09-05 14:22:09,778] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 71 [default0]:[2022-09-05 14:22:09,808] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 144 [default1]:[2022-09-05 14:22:09,745] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 145 [default2]:[2022-09-05 14:22:09,829] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 186 [default1]:[2022-09-05 14:22:09,810] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 121 [default2]:[2022-09-05 14:22:09,844] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 18 [default0]:[2022-09-05 14:22:09,809] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 120 [default1]:[2022-09-05 14:22:09,927] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 33 [default0]:[2022-09-05 14:22:09,924] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 32 [default1]:[2022-09-05 14:22:09,914] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 73 [default4]:[2022-09-05 14:22:09,838] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 36 [default0]:[2022-09-05 14:22:09,846] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 24 [default0]:[2022-09-05 14:22:09,908] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 72 [default2]:[2022-09-05 14:22:09,851] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 242 [default1]:[2022-09-05 14:22:09,919] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 177 [default0]:[2022-09-05 14:22:09,915] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 176 [default1]:[2022-09-05 14:22:09,963] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 57 [default5]:[2022-09-05 14:22:09,975] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 189 [default0]:[2022-09-05 14:22:10,031] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 40 [default0]:[2022-09-05 14:22:09,960] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 56 [default1]:[2022-09-05 14:22:10,034] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 41 [default6]:[2022-09-05 14:22:10,031] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 150 [default4]:[2022-09-05 14:22:10,012] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 276 [default2]:[2022-09-05 14:22:10,015] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 234 [default0]:[2022-09-05 14:22:10,031] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 112 [default5]:[2022-09-05 14:22:09,987] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 269 [default5]:[2022-09-05 14:22:10,014] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 277 [default2]:[2022-09-05 14:22:10,079] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 106 [default0]:[2022-09-05 14:22:10,132] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 256 [default2]:[2022-09-05 14:22:10,060] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 74 [default3]:[2022-09-05 14:22:10,049] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 75 [default1]:[2022-09-05 14:22:10,134] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 257 [default0]:[2022-09-05 14:22:10,038] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 16 [default5]:[2022-09-05 14:22:10,087] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 253 [default4]:[2022-09-05 14:22:10,080] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 252 [default2]:[2022-09-05 14:22:10,119] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 58 [default5]:[2022-09-05 14:22:10,068] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 205 [default4]:[2022-09-05 14:22:10,069] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 204 [default4]:[2022-09-05 14:22:10,102] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 164 [default5]:[2022-09-05 14:22:10,103] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 165 [default1]:[2022-09-05 14:22:10,140] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 161 [default5]:[2022-09-05 14:22:10,079] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 133 [default4]:[2022-09-05 14:22:10,077] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 132 [default6]:[2022-09-05 14:22:10,048] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 46 [default0]:[2022-09-05 14:22:10,143] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 272 [default1]:[2022-09-05 14:22:10,072] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 113 [default4]:[2022-09-05 14:22:10,071] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 44 [default0]:[2022-09-05 14:22:10,065] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 264 [default4]:[2022-09-05 14:22:10,229] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 196 [default4]:[2022-09-05 14:22:10,189] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 244 [default5]:[2022-09-05 14:22:10,186] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 245 [default0]:[2022-09-05 14:22:10,188] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 184 [default1]:[2022-09-05 14:22:10,185] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 185 [default6]:[2022-09-05 14:22:10,235] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 166 [default4]:[2022-09-05 14:22:10,160] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 236 [default1]:[2022-09-05 14:22:10,186] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 265 [default1]:[2022-09-05 14:22:10,288] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 169 [default0]:[2022-09-05 14:22:10,288] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 168 [default1]:[2022-09-05 14:22:10,275] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 49 [default5]:[2022-09-05 14:22:10,238] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 197 [default0]:[2022-09-05 14:22:10,274] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 48 [default6]:[2022-09-05 14:22:10,297] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 94 [default6]:[2022-09-05 14:22:10,266] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 254 [default1]:[2022-09-05 14:22:10,312] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 65 [default2]:[2022-09-05 14:22:10,290] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 282 [default4]:[2022-09-05 14:22:10,252] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 188 [default0]:[2022-09-05 14:22:10,280] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 128 [default1]:[2022-09-05 14:22:10,245] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 273 [default1]:[2022-09-05 14:22:10,281] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 129 [default5]:[2022-09-05 14:22:10,261] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 85 [default4]:[2022-09-05 14:22:10,258] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 84 [default6]:[2022-09-05 14:22:10,339] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 238 [default1]:[2022-09-05 14:22:10,356] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 25 [default0]:[2022-09-05 14:22:10,352] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 280 [default1]:[2022-09-05 14:22:10,360] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 281 [default1]:[2022-09-05 14:22:10,419] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 233 [default4]:[2022-09-05 14:22:10,473] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 52 [default4]:[2022-09-05 14:22:10,530] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 68 [default6]:[2022-09-05 14:22:10,496] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 70 [default5]:[2022-09-05 14:22:10,534] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 69 [default6]:[2022-09-05 14:22:10,451] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 142 [default2]:[2022-09-05 14:22:10,528] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 266 [default4]:[2022-09-05 14:22:10,534] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 268 [default5]:[2022-09-05 14:22:10,652] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 93 [default4]:[2022-09-05 14:22:10,651] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 92 [default1]:[2022-09-05 14:22:10,678] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 209 [default4]:[2022-09-05 14:22:10,656] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 140 [default2]:[2022-09-05 14:22:10,822] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 138 [default0]:[2022-09-05 14:22:10,842] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 232 [default6]:[2022-09-05 14:22:10,807] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 6 [default5]:[2022-09-05 14:22:10,854] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 5 [default0]:[2022-09-05 14:22:10,871] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 64 [default0]:[2022-09-05 14:22:10,910] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 136 [default5]:[2022-09-05 14:22:10,853] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 237 [default5]:[2022-09-05 14:22:11,208] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 53 [default1]:[2022-09-05 14:22:11,277] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 17 [default4]:[2022-09-05 14:22:11,351] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 4 [default1]:[2022-09-05 14:22:11,359] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 137 [default5]:[2022-09-05 14:22:11,489] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 141 [default4]:[2022-09-05 14:22:14,870] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 284 [default0]:[2022-09-05 14:22:15,285] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 0 [default0]: checkpoint version 3.0 [default7]:[2022-09-05 14:22:15,466] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 287 [default3]:[2022-09-05 14:22:15,708] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 3 [default6]:[2022-09-05 14:22:18,215] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 286 [default5]:[2022-09-05 14:22:18,292] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 285 [default1]:[2022-09-05 14:22:18,648] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 1 [default0]: successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq at iteration 95000 [default7]:time (ms) | load-checkpoint: 28913.96 [default2]:[2022-09-05 14:22:20,720] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 2 [default0]:estimated model parameters: 258.958393344 [default0]:estimated model parameters without embeddings: 0.002064384 [default0]:[after model, optimizer, and learning rate scheduler are built] datetime: 2022-09-05 14:22:20 [default0]:> building train, validation, and test datasets ... [default0]: > datasets target sizes (minimum size): [default0]: train: 6348800 [default0]: validation: 266240 [default0]: test: 20480 [default0]:> building train, validation, and test datasets for T0 ... [default0]: > building dataset index ... [default0]:/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/utils.py:365: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings [default0]: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.108187 seconds [default0]: number of documents: 90897616 [default0]: > dataset split: [default0]: train: [default0]: document indices in [0, 90897616) total of 90897616 documents [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.038324 seconds [default0]: number of documents: 90897616 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.002973 seconds [default0]: number of documents: 90897616 [default0]: > loading doc-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_batch_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_shuffle_idx.npy [default0]: loaded indexed file in 0.064 seconds [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.199396 seconds [default0]: number of documents: 15234080 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [14472376, 15234080) total of 761704 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_8848ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_8848ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_8848ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.051 seconds [default0]: total number of samples: 221750 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.112919 seconds [default0]: number of documents: 6142390 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [5835270, 6142390) total of 307120 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_3009ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_3009ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_3009ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.029 seconds [default0]: total number of samples: 136143 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.213071 seconds [default0]: number of documents: 26176998 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [24868148, 26176998) total of 1308850 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_34858ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_34858ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_34858ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.056 seconds [default0]: total number of samples: 432311 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.135201 seconds [default0]: number of documents: 20844665 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [19802432, 20844665) total of 1042233 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_59324ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_59324ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_59324ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.168 seconds [default0]: total number of samples: 521545 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.286867 seconds [default0]: number of documents: 67005817 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [63655526, 67005817) total of 3350291 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_28545ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_28545ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_28545ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.180 seconds [default0]: total number of samples: 1740321 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.198178 seconds [default0]: number of documents: 5149795 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4892305, 5149795) total of 257490 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_418ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_418ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_418ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.008 seconds [default0]: total number of samples: 26370 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.157813 seconds [default0]: number of documents: 58847091 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [55904736, 58847091) total of 2942355 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_34929ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_34929ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_34929ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.122 seconds [default0]: total number of samples: 1458654 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.205750 seconds [default0]: number of documents: 12514253 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11888540, 12514253) total of 625713 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_2922ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_2922ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_2922ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.094 seconds [default0]: total number of samples: 134071 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.032577 seconds [default0]: number of documents: 180608 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [171578, 180608) total of 9030 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_30ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_30ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_30ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.028 seconds [default0]: total number of samples: 2501 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.117308 seconds [default0]: number of documents: 12303134 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11687977, 12303134) total of 615157 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_1470ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_1470ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_1470ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.073 seconds [default0]: total number of samples: 157244 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.099076 seconds [default0]: number of documents: 2033057 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1931404, 2033057) total of 101653 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_108ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_108ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_108ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.008 seconds [default0]: total number of samples: 20517 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.065713 seconds [default0]: number of documents: 26793553 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [25453875, 26793553) total of 1339678 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1999ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1999ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1999ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.038 seconds [default0]: total number of samples: 101502 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.284776 seconds [default0]: number of documents: 3155990 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2998190, 3155990) total of 157800 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_166ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_166ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_166ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.018 seconds [default0]: total number of samples: 44182 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.101638 seconds [default0]: number of documents: 6692522 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [6357896, 6692522) total of 334626 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_277ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_277ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_277ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.028 seconds [default0]: total number of samples: 47613 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.185139 seconds [default0]: number of documents: 3017261 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2866398, 3017261) total of 150863 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_135ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_135ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_135ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.041 seconds [default0]: total number of samples: 29298 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.159257 seconds [default0]: number of documents: 3648041 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [3465639, 3648041) total of 182402 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_179ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_179ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_179ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.038 seconds [default0]: total number of samples: 5659 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.195896 seconds [default0]: number of documents: 4327282 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4110918, 4327282) total of 216364 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_97ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_97ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_97ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.063 seconds [default0]: total number of samples: 12423 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.174175 seconds [default0]: number of documents: 2698896 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2563951, 2698896) total of 134945 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_137ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_137ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_137ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.041 seconds [default0]: total number of samples: 19133 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.064698 seconds [default0]: number of documents: 12767593 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [12129213, 12767593) total of 638380 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_566ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_566ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_566ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.034 seconds [default0]: total number of samples: 87928 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.100459 seconds [default0]: number of documents: 4342323 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4125207, 4342323) total of 217116 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_245ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_245ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_245ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.030 seconds [default0]: total number of samples: 69780 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.047528 seconds [default0]: number of documents: 3022722 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2871586, 3022722) total of 151136 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_334ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_334ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_334ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.012 seconds [default0]: total number of samples: 22532 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.097829 seconds [default0]: number of documents: 1162568 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1104440, 1162568) total of 58128 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_85ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_85ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_85ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.015 seconds [default0]: total number of samples: 1608 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.158042 seconds [default0]: number of documents: 55294645 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [52529913, 55294645) total of 2764732 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_21773ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_21773ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_21773ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.117 seconds [default0]: total number of samples: 690621 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.330714 seconds [default0]: number of documents: 44855616 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [42612835, 44855616) total of 2242781 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_14796ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_14796ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_14796ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.067 seconds [default0]: total number of samples: 468689 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.130383 seconds [default0]: number of documents: 31969891 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [30371396, 31969891) total of 1598495 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_13256ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_13256ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_13256ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.041 seconds [default0]: total number of samples: 497625 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.241894 seconds [default0]: number of documents: 34110375 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [32404856, 34110375) total of 1705519 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_6587ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_6587ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_6587ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.107 seconds [default0]: total number of samples: 125120 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.153564 seconds [default0]: number of documents: 43761623 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [41573542, 43761623) total of 2188081 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_32355ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_32355ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_32355ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.140 seconds [default0]: total number of samples: 1010592 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.051228 seconds [default0]: number of documents: 197602 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [187722, 197602) total of 9880 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.021 seconds [default0]: total number of samples: 4451 [default0]: total number of epochs: 1 [default0]:> building indices for blendable datasets ... [default0]: > sample ratios: [default0]: dataset 0, input: 0.0330676, achieved: 0.0330676 [default0]: dataset 1, input: 0.0112421, achieved: 0.0112421 [default0]: dataset 2, input: 0.130272, achieved: 0.130272 [default0]: dataset 3, input: 0.221712, achieved: 0.221712 [default0]: dataset 4, input: 0.106678, achieved: 0.106678 [default0]: dataset 5, input: 0.00155951, achieved: 0.00155955 [default0]: dataset 6, input: 0.13054, achieved: 0.13054 [default0]: dataset 7, input: 0.010918, achieved: 0.0109181 [default0]: dataset 8, input: 0.000110214, achieved: 0.000110257 [default0]: dataset 9, input: 0.00549238, achieved: 0.00549235 [default0]: dataset 10, input: 0.000402122, achieved: 0.000402094 [default0]: dataset 11, input: 0.00747007, achieved: 0.00747007 [default0]: dataset 12, input: 0.000619047, achieved: 0.000619024 [default0]: dataset 13, input: 0.00103353, achieved: 0.0010336 [default0]: dataset 14, input: 0.000501201, achieved: 0.000501226 [default0]: dataset 15, input: 0.000667277, achieved: 0.000667231 [default0]: dataset 16, input: 0.000359281, achieved: 0.000359326 [default0]: dataset 17, input: 0.000508443, achieved: 0.000508519 [default0]: dataset 18, input: 0.00211373, achieved: 0.0021138 [default0]: dataset 19, input: 0.000912995, achieved: 0.000912961 [default0]: dataset 20, input: 0.00124543, achieved: 0.00124546 [default0]: dataset 21, input: 0.000315887, achieved: 0.00031594 [default0]: dataset 22, input: 0.0813721, achieved: 0.0813721 [default0]: dataset 23, input: 0.0552939, achieved: 0.0552939 [default0]: dataset 24, input: 0.0495415, achieved: 0.0495414 [default0]: dataset 25, input: 0.0246164, achieved: 0.0246163 [default0]: dataset 26, input: 0.120917, achieved: 0.120917 [default0]: dataset 27, input: 0.000517703, achieved: 0.000517666 [default0]:> elapsed time for building blendable dataset indices: 0.33 (sec) [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.046801 seconds [default0]: number of documents: 2940097 [default0]: > dataset split: [default0]: valid: [default0]: document indices in [0, 2940097) total of 2940097 documents [default0]: > building dataset index ... [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.059572 seconds [default0]: number of documents: 2940097 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003690 seconds [default0]: number of documents: 2940097 [default0]: > loading doc-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation_valid_indexmap_266240ns_42s_decoder_packed_batch_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation_valid_indexmap_266240ns_42s_decoder_packed_shuffle_idx.npy [default0]: loaded indexed file in 0.072 seconds [default0]:> finished creating T0 datasets ... [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 199, in pretrain [default1]: evaluate_and_print_results(prefix, forward_step_func, [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1103, in evaluate_and_print_results [default1]: total_loss_dict = evaluate(forward_step_func, data_iterator, model, verbose) [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1056, in evaluate [default1]: loss = model[0].eval_batch(data_iterator) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 441, in eval_batch [default1]: self._exec_schedule(sched) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 1375, in _exec_schedule [default1]: self._exec_instr(**cmd.kwargs) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 789, in _exec_load_micro_batch [default1]: batch = self._next_batch() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 621, in _next_batch [default1]: batch = next(self.data_iterator) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 652, in __next__ [default1]: data = self._next_data() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data [default1]: return self._process_data(data) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data [default1]: data.reraise() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise [default1]: raise exception [default1]:IndexError: Caught IndexError in DataLoader worker process 0. [default1]:Original Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop [default1]: data = fetcher.fetch(index) [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch [default1]: data = [self.dataset[idx] for idx in possibly_batched_index] [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in [default1]: data = [self.dataset[idx] for idx in possibly_batched_index] [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/blendable_dataset.py", line 68, in __getitem__ [default1]: return self.datasets[dataset_idx][sample_idx] [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 251, in __getitem__ [default1]: idx = self.shuffle_idx[idx] [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/numpy/core/memmap.py", line 334, in __getitem__ [default1]: res = super().__getitem__(index) [default1]:IndexError: index 557684 is out of bounds for axis 0 with size 521544 [default1]: [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 199, in pretrain [default2]: evaluate_and_print_results(prefix, forward_step_func, [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1103, in evaluate_and_print_results [default2]: total_loss_dict = evaluate(forward_step_func, data_iterator, model, verbose) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1056, in evaluate [default2]: loss = model[0].eval_batch(data_iterator) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 441, in eval_batch [default2]: self._exec_schedule(sched) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 1375, in _exec_schedule [default2]: self._exec_instr(**cmd.kwargs) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 789, in _exec_load_micro_batch [default2]: batch = self._next_batch() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 621, in _next_batch [default2]: batch = next(self.data_iterator) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 652, in __next__ [default2]: data = self._next_data() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data [default2]: return self._process_data(data) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data [default2]: data.reraise() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise [default2]: raise exception [default2]:IndexError: Caught IndexError in DataLoader worker process 0. [default2]:Original Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop [default2]: data = fetcher.fetch(index) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch [default2]: data = [self.dataset[idx] for idx in possibly_batched_index] [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in [default2]: data = [self.dataset[idx] for idx in possibly_batched_index] [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/blendable_dataset.py", line 68, in __getitem__ [default2]: return self.datasets[dataset_idx][sample_idx] [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 251, in __getitem__ [default2]: idx = self.shuffle_idx[idx] [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/numpy/core/memmap.py", line 334, in __getitem__ [default2]: res = super().__getitem__(index) [default2]:IndexError: index 557686 is out of bounds for axis 0 with size 521544 [default2]: [default0]:[after dataloaders are built] datetime: 2022-09-05 14:22:36 [default0]:done with setup ... [default0]:training ... [default0]:[after training is done] datetime: 2022-09-05 14:22:36 [default7]:time (ms) | model-and-optimizer-setup: 37422.91 | train/valid/test-data-iterators-setup: 14671.03 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 199, in pretrain [default5]: evaluate_and_print_results(prefix, forward_step_func, [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1103, in evaluate_and_print_results [default5]: total_loss_dict = evaluate(forward_step_func, data_iterator, model, verbose) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1056, in evaluate [default5]: loss = model[0].eval_batch(data_iterator) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 441, in eval_batch [default5]: self._exec_schedule(sched) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 1375, in _exec_schedule [default5]: self._exec_instr(**cmd.kwargs) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 789, in _exec_load_micro_batch [default5]: batch = self._next_batch() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 621, in _next_batch [default5]: batch = next(self.data_iterator) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 652, in __next__ [default5]: data = self._next_data() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data [default5]: return self._process_data(data) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data [default5]: data.reraise() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise [default5]: raise exception [default5]:IndexError: Caught IndexError in DataLoader worker process 0. [default5]:Original Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop [default5]: data = fetcher.fetch(index) [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch [default5]: data = [self.dataset[idx] for idx in possibly_batched_index] [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in [default5]: data = [self.dataset[idx] for idx in possibly_batched_index] [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/blendable_dataset.py", line 68, in __getitem__ [default5]: return self.datasets[dataset_idx][sample_idx] [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 251, in __getitem__ [default5]: idx = self.shuffle_idx[idx] [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/numpy/core/memmap.py", line 334, in __getitem__ [default5]: res = super().__getitem__(index) [default5]:IndexError: index 557684 is out of bounds for axis 0 with size 521544 [default5]: [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 199, in pretrain [default3]: evaluate_and_print_results(prefix, forward_step_func, [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1103, in evaluate_and_print_results [default3]: total_loss_dict = evaluate(forward_step_func, data_iterator, model, verbose) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1056, in evaluate [default3]: loss = model[0].eval_batch(data_iterator) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 441, in eval_batch [default3]: self._exec_schedule(sched) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 1375, in _exec_schedule [default3]: self._exec_instr(**cmd.kwargs) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 789, in _exec_load_micro_batch [default3]: batch = self._next_batch() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 621, in _next_batch [default3]: batch = next(self.data_iterator) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 652, in __next__ [default3]: data = self._next_data() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data [default3]: return self._process_data(data) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data [default3]: data.reraise() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise [default3]: raise exception [default3]:IndexError: Caught IndexError in DataLoader worker process 0. [default3]:Original Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop [default3]: data = fetcher.fetch(index) [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch [default3]: data = [self.dataset[idx] for idx in possibly_batched_index] [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in [default3]: data = [self.dataset[idx] for idx in possibly_batched_index] [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/blendable_dataset.py", line 68, in __getitem__ [default3]: return self.datasets[dataset_idx][sample_idx] [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 251, in __getitem__ [default3]: idx = self.shuffle_idx[idx] [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/numpy/core/memmap.py", line 334, in __getitem__ [default3]: res = super().__getitem__(index) [default3]:IndexError: index 557688 is out of bounds for axis 0 with size 521544 [default3]: [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 199, in pretrain [default0]: evaluate_and_print_results(prefix, forward_step_func, [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1103, in evaluate_and_print_results [default0]: total_loss_dict = evaluate(forward_step_func, data_iterator, model, verbose) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1056, in evaluate [default0]: loss = model[0].eval_batch(data_iterator) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 441, in eval_batch [default0]: self._exec_schedule(sched) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 1375, in _exec_schedule [default0]: self._exec_instr(**cmd.kwargs) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 789, in _exec_load_micro_batch [default0]: batch = self._next_batch() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 621, in _next_batch [default0]: batch = next(self.data_iterator) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 652, in __next__ [default0]: data = self._next_data() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1327, in _next_data [default0]: return self._process_data(data) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data [default0]: data.reraise() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise [default0]: raise exception [default0]:IndexError: Caught IndexError in DataLoader worker process 0. [default0]:Original Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop [default0]: data = fetcher.fetch(index) [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch [default0]: data = [self.dataset[idx] for idx in possibly_batched_index] [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in [default0]: data = [self.dataset[idx] for idx in possibly_batched_index] [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/blendable_dataset.py", line 68, in __getitem__ [default0]: return self.datasets[dataset_idx][sample_idx] [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/gpt_dataset.py", line 251, in __getitem__ [default0]: idx = self.shuffle_idx[idx] [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/numpy/core/memmap.py", line 334, in __getitem__ [default0]: res = super().__getitem__(index) [default0]:IndexError: index 557689 is out of bounds for axis 0 with size 521544 [default0]: WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1875982 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1875983 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1875984 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1875985 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1875986 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1875988 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1875989 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3734057 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3734061 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3734062 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3734063 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3734064 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 5 (pid: 1875987) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 3734058) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python srun: Job step aborted: Waiting up to 62 seconds for job step to finish. WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4072801 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4072802 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3750173 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1989178 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3029702 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2078043 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3750174 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1989179 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3029703 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3117453 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2078044 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4072803 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2113810 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2068899 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3029704 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3117454 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 517461 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2325923 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3750175 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2060292 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2113811 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2270026 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2068900 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 467676 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2735487 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1989180 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 517462 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2325924 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2078045 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2060293 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3705665 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 611037 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2270027 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 467677 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 610274 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4072804 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1649385 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3117455 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2735488 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1541259 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3029705 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3690716 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3705666 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 611038 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2113812 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 345993 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 610275 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1649386 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3750176 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1541260 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 517463 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3690717 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4072805 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2068901 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2325925 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1897762 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1813298 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2060294 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1989181 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3750177 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 345994 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 467678 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3690718 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4072806 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2270028 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3250122 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1417199 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1897763 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3750178 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1813299 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3705667 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4072807 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 611039 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3117456 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 610276 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3690719 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3750179 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1541261 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2113813 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1649387 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1417200 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 517464 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4072808 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3250123 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3881927 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2767841 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3690720 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2068902 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3750180 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 345995 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1471608 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 517465 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1989182 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 467679 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3881928 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3029706 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1677868 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4012032 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2078046 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2060295 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2068903 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3138671 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1813300 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2767842 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3690721 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1897764 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3079629 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2735489 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 505335 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 517466 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1025608 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1471609 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1989183 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 467680 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4012033 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3690722 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2078047 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1677869 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 611040 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2068904 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1417201 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3250124 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3117457 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3138672 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 517467 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 505336 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 610277 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3079630 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1989184 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 467681 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1025609 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3690723 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2078048 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2068905 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2325926 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3117458 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1989185 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 517468 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 610278 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 467682 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3881929 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2270029 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2767843 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2078049 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2068906 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1471610 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3117459 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 467683 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 610279 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2060296 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3705668 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1897765 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1677870 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2078050 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1541262 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4012034 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2735490 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1649388 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1025610 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3138673 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3117460 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3079631 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 345996 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 505337 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 611041 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1417202 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 610280 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1897766 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3029707 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2325927 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 345997 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 610281 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1813301 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 611042 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1897767 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2767844 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2113814 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2270030 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1471611 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 345998 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1897768 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 611043 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2060297 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3705669 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1813302 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 345999 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1677871 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1541263 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3250125 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1897769 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 611044 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2060298 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2270031 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1417203 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1813303 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 346000 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3881930 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2060299 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1813304 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1417204 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2767845 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3079632 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4012035 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1649389 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2735491 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1471612 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1813305 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1417205 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3138674 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1025611 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3705670 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3079633 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 505338 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2113815 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4012036 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3250126 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1417206 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1471613 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3079634 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2325928 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3881931 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4012037 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3250127 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2767846 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1471614 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4012038 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3079635 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3250128 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1649390 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3029708 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1541264 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1471615 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2735492 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4012039 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3079636 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2113816 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3250129 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1677872 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1025612 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1649391 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1025613 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2325929 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3881932 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2270032 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1025614 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1649392 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3705671 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 505339 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1025615 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2735493 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2113817 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2325930 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2767847 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3138675 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3029709 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1541265 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2270033 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1677873 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2735494 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3881933 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 505340 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2767848 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3138676 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1541266 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1677874 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3705672 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 505341 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3881934 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3138677 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1677875 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 505342 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3138678 closing signal SIGTERM slurmstepd: error: *** STEP 1006357.0 ON jean-zay-iam02 CANCELLED AT 2022-09-05T14:27:36 *** Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 345954 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3881889 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 610235 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper result = self._invoke_run(role) return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 517423 got signal: 15 run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler result = self._invoke_run(role) raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run torch.distributed.elastic.multiprocessing.api.SignalException: Process 4072763 got signal: 15 time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2735449 got signal: 15 return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2269988 got signal: 15 elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2068861 got signal: 15 main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 4011993 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2325885 got signal: 15 return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent elastic_launch( result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = agent.run() result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) result = self._invoke_run(role) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main time.sleep(monitor_interval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1649347 got signal: 15 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 505297 got signal: 15 return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main torch.distributed.elastic.multiprocessing.api.SignalException: Process 1417160 got signal: 15 return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3138633 got signal: 15 return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3690677 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in run(args) return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper result = agent.run() elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = self._invoke_run(role) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main result = f(*args, **kwargs) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper time.sleep(monitor_interval) result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run torch.distributed.elastic.multiprocessing.api.SignalException: Process 3250084 got signal: 15 time.sleep(monitor_interval) result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3705625 got signal: 15 time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 610998 got signal: 15 return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1025570 got signal: 15 main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1989134 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2078004 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2767803 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1541221 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1471570 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1897723 got signal: 15 return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1813259 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in time.sleep(monitor_interval) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3029662 got signal: 15 main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main run(args) return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main result = self._invoke_run(role) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run torch.distributed.elastic.multiprocessing.api.SignalException: Process 3117414 got signal: 15 result = self._invoke_run(role) return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent torch.distributed.elastic.multiprocessing.api.SignalException: Process 467637 got signal: 15 return f(*args, **kwargs) result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run run(args) return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2060248 got signal: 15 elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = self._invoke_run(role) result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) time.sleep(monitor_interval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1677829 got signal: 15 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3079587 got signal: 15 main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 3750132 got signal: 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2113771 got signal: 15 WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** [default7]:> setting tensorboard ... [default0]:using world size: 288, data-parallel-size: 4, tensor-model-parallel size: 1, pipeline-model-parallel size: 72 [default0]:accumulate and all-reduce gradients in fp32 for bfloat16 data type. [default0]:using torch.bfloat16 for parameters ... [default0]:------------------------ arguments ------------------------ [default0]: abort_on_unmet_fused_kernel_constraints ......... True [default0]: accumulate_allreduce_grads_in_fp32 .............. True [default0]: adam_beta1 ...................................... 0.9 [default0]: adam_beta2 ...................................... 0.95 [default0]: adam_eps ........................................ 1e-08 [default0]: adlr_autoresume ................................. False [default0]: adlr_autoresume_interval ........................ 1000 [default0]: apply_query_key_layer_scaling ................... True [default0]: apply_residual_connection_post_layernorm ........ False [default0]: attention_dropout ............................... 0.1 [default0]: attention_softmax_in_fp32 ....................... False [default0]: bert_binary_head ................................ True [default0]: bert_load ....................................... None [default0]: bf16 ............................................ True [default0]: bias_dropout_fusion ............................. True [default0]: bias_gelu_fusion ................................ True [default0]: biencoder_projection_dim ........................ 0 [default0]: biencoder_shared_query_context_model ............ False [default0]: block_data_path ................................. None [default0]: checkpoint_activations .......................... True [default0]: checkpoint_in_cpu ............................... False [default0]: checkpoint_num_layers ........................... 1 [default0]: clip_grad ....................................... 1.0 [default0]: codecarbon_dir .................................. None [default0]: consumed_train_samples .......................... 0 [default0]: consumed_train_tokens ........................... 0 [default0]: consumed_valid_samples .......................... 0 [default0]: contigious_checkpointing ........................ False [default0]: cpu_optimizer ................................... False [default0]: cpu_torch_adam .................................. False [default0]: curriculum_learning ............................. False [default0]: data_impl ....................................... mmap [default0]: data_parallel_size .............................. 4 [default0]: data_path ....................................... None [default0]: dataloader_type ................................. single [default0]: DDP_impl ........................................ local [default0]: decoder_seq_length .............................. None [default0]: deepscale ....................................... False [default0]: deepscale_config ................................ None [default0]: deepspeed ....................................... True [default0]: deepspeed_activation_checkpointing .............. True [default0]: deepspeed_config ................................ ./ds_config.1007214.json [default0]: deepspeed_mpi ................................... False [default0]: distribute_checkpointed_activations ............. False [default0]: distributed_backend ............................. nccl [default0]: embed_layernorm ................................. True [default0]: embedding_path .................................. None [default0]: encoder_seq_length .............................. 2048 [default0]: eod_mask_loss ................................... False [default0]: eval_interval ................................... 250 [default0]: eval_iters ...................................... 10 [default0]: eval_only ....................................... True [default0]: evidence_data_path .............................. None [default0]: exit_duration_in_mins ........................... 5990 [default0]: exit_interval ................................... None [default0]: ffn_hidden_size ................................. 57344 [default0]: finetune ........................................ False [default0]: fp16 ............................................ False [default0]: fp16_lm_cross_entropy ........................... False [default0]: fp32_residual_connection ........................ False [default0]: gigaflos_no_embeds .............................. 0 [default0]: global_batch_size ............................... 2048 [default0]: glu_activation .................................. None [default0]: hidden_dropout .................................. 0.1 [default0]: hidden_size ..................................... 14336 [default0]: hysteresis ...................................... 2 [default0]: ict_head_size ................................... None [default0]: ict_load ........................................ None [default0]: img_dim ......................................... 224 [default0]: indexer_batch_size .............................. 128 [default0]: indexer_log_interval ............................ 1000 [default0]: inference ....................................... False [default0]: init_method_std ................................. 0.0048 [default0]: init_method_xavier_uniform ...................... False [default0]: initial_loss_scale .............................. 4294967296 [default0]: kill_switch_path ................................ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/kill-switch-tr13-176B-mtf [default0]: kv_channels ..................................... 128 [default0]: layernorm_epsilon ............................... 1e-05 [default0]: lazy_mpu_init ................................... None [default0]: load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: local_rank ...................................... None [default0]: log_batch_size_to_tensorboard ................... True [default0]: log_interval .................................... 1 [default0]: log_learning_rate_to_tensorboard ................ True [default0]: log_level ....................................... None [default0]: log_level_replica ............................... None [default0]: log_loss_scale_to_tensorboard ................... True [default0]: log_num_zeros_in_grad ........................... False [default0]: log_params_norm ................................. False [default0]: log_path ........................................ None [default0]: log_timers_to_tensorboard ....................... True [default0]: log_validation_ppl_to_tensorboard ............... True [default0]: loss_on_targets_only ............................ False [default0]: loss_scale ...................................... None [default0]: loss_scale_window ............................... 1000 [default0]: lr .............................................. 2e-05 [default0]: lr_decay_iters .................................. None [default0]: lr_decay_samples ................................ None [default0]: lr_decay_style .................................. constant [default0]: lr_decay_tokens ................................. None [default0]: lr_warmup_fraction .............................. None [default0]: lr_warmup_iters ................................. 0 [default0]: lr_warmup_samples ............................... 0 [default0]: make_vocab_size_divisible_by .................... 128 [default0]: mask_prob ....................................... 0.15 [default0]: masked_softmax_fusion ........................... True [default0]: max_position_embeddings ......................... 2048 [default0]: mean_noise_span_length .......................... None [default0]: memory_centric_tiled_linear ..................... False [default0]: merge_file ...................................... None [default0]: micro_batch_size ................................ 1 [default0]: min_loss_scale .................................. 1.0 [default0]: min_lr .......................................... 0.0 [default0]: mmap_warmup ..................................... False [default0]: no_load_optim ................................... True [default0]: no_load_rng ..................................... None [default0]: no_save_optim ................................... None [default0]: no_save_rng ..................................... None [default0]: noise_density ................................... None [default0]: norm_target_loss ................................ True [default0]: num_attention_heads ............................. 112 [default0]: num_channels .................................... 3 [default0]: num_classes ..................................... 1000 [default0]: num_layers ...................................... 70 [default0]: num_layers_per_virtual_pipeline_stage ........... None [default0]: num_workers ..................................... 2 [default0]: onnx_safe ....................................... None [default0]: openai_gelu ..................................... False [default0]: optimizer ....................................... adam [default0]: override_lr_scheduler ........................... False [default0]: pad_vocab_size_to ............................... 250880 [default0]: params_dtype .................................... torch.bfloat16 [default0]: partition_activations ........................... False [default0]: patch_dim ....................................... 16 [default0]: pipeline_model_parallel_size .................... 72 [default0]: position_embedding_type ......................... PositionEmbeddingType.alibi [default0]: pp_partition_method ............................. type:transformer|embedding [default0]: prefixlm ........................................ False [default0]: profile_backward ................................ False [default0]: query_in_block_prob ............................. 0.1 [default0]: rampup_batch_size ............................... None [default0]: rank ............................................ 0 [default0]: remote_device ................................... none [default0]: reset_attention_mask ............................ False [default0]: reset_position_ids .............................. False [default0]: reset_progress .................................. None [default0]: retriever_report_topk_accuracies ................ [] [default0]: retriever_score_scaling ......................... False [default0]: retriever_seq_length ............................ 256 [default0]: reweight_loss_based_on_position_frequency ....... False [default0]: sample_rate ..................................... 1.0 [default0]: save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: save_interval ................................... 5 [default0]: scatter_gather_tensors_in_pipeline .............. True [default0]: scattered_embeddings ............................ False [default0]: seed ............................................ 42 [default0]: seq_length ...................................... 2048 [default0]: sgd_momentum .................................... 0.9 [default0]: short_seq_prob .................................. 0.1 [default0]: skip_train_iteration_range ...................... None [default0]: split ........................................... None [default0]: split_transformers .............................. False [default0]: sync_tp_duplicated_parameters ................... True [default0]: synchronize_each_layer .......................... False [default0]: tensor_model_parallel_size ...................... 1 [default0]: tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/tr13-176B-ml-t0-logs/tensorboard/p31lossseq [default0]: tensorboard_log_interval ........................ 1 [default0]: tensorboard_queue_size .......................... 5 [default0]: test_weighted_split_paths ....................... None [default0]: test_weighted_split_paths_path .................. None [default0]: tile_factor ..................................... 1 [default0]: titles_data_path ................................ None [default0]: tokenizer_name_or_path .......................... bigscience/tokenizer [default0]: tokenizer_type .................................. PretrainedFromHF [default0]: train_iters ..................................... None [default0]: train_samples ................................... 6348800 [default0]: train_tokens .................................... None [default0]: train_weighted_split_names ...................... ['train'] [default0]: train_weighted_split_paths ...................... [['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train']] [default0]: train_weighted_split_paths_path ................. None [default0]: train_weighted_split_splits ..................... [['0:1']] [default0]: train_weighted_split_weights .................... [['1']] [default0]: universal_checkpoint ............................ True [default0]: use_bnb_optimizer ............................... False [default0]: use_checkpoint_lr_scheduler ..................... False [default0]: use_contiguous_buffers_in_ddp ................... True [default0]: use_cpu_initialization .......................... None [default0]: use_one_sent_docs ............................... False [default0]: use_pin_memory .................................. False [default0]: valid_num_workers ............................... 2 [default0]: valid_weighted_split_names ...................... ['validation_pretraining', 'valid'] [default0]: valid_weighted_split_paths ...................... [['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation']] [default0]: valid_weighted_split_paths_path ................. None [default0]: valid_weighted_split_splits ..................... [['0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0'], ['0:1']] [default0]: valid_weighted_split_weights .................... [['0.0330676168743166', '0.011242051312222764', '0.13027200903379185', '0.22171164529099704', '0.10667815627928671', '0.0015595123898173287', '0.13054018439603915', '0.01091803753667153', '0.00011021422347108609', '0.005492381453597748', '0.0004021215011318779', '0.007470068593492175', '0.0006190467776576425', '0.0010335296343329384', '0.0005012010684646179', '0.0006672772956128299', '0.00035928138344705506', '0.0005084433130291778', '0.0021137328219915496', '0.0009129946225980253', '0.0012454301613725426', '0.00031588689199263235', '0.08137213783015229', '0.055293935695898196', '0.04954150576361177', '0.02461641286531197', '0.12091748245519074', '0.0005177025345001541'], ['1']] [default0]: virtual_pipeline_model_parallel_size ............ None [default0]: vocab_extra_ids ................................. 0 [default0]: vocab_file ...................................... None [default0]: weight_decay .................................... 0.0001 [default0]: world_size ...................................... 288 [default0]: zero_allgather_bucket_size ...................... 0.0 [default0]: zero_contigious_gradients ....................... False [default0]: zero_reduce_bucket_size ......................... 0.0 [default0]: zero_reduce_scatter ............................. False [default0]: zero_stage ...................................... 0 [default0]:-------------------- end of arguments --------------------- [default0]:setting number of micro-batches to constant 512 [default0]:> building PretrainedFromHF tokenizer ... [default0]: vocab file is un-used. loading tokenizer from pre-trained model [default0]:Offline mode: forcing local_files_only=True [default0]:Offline mode: forcing local_files_only=True [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer.json from cache at /gpfswork/rech/six/commun/models/29d0a41f4527257b8afe6d5495f492dac260318430f18239a42ca5f6dc4487fc.7b0fb8edc2986944ff9b7418149b52d8c4a1354a17d0360deb8974da70c6cc03 [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/added_tokens.json from cache at None [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/special_tokens_map.json from cache at /gpfswork/rech/six/commun/models/4f03e43bcc54e0721823e6a06b1d197905e2ea79aa7dcc1a0f0fcecc73ce3fb2.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer_config.json from cache at /gpfswork/rech/six/commun/models/9441c67b923ef7a65950a64e31c40f80ed181ba59502981a80f2cd0c438c6432.3c09887250243e50d8de9d10b2a778152434f62a22a95b5f89dbbe79a6eb496a [default0]: > padded vocab (size: 250680) with 200 dummy tokens (new size: 250880) [default0]:DeepSpeed general environment info: [default0]:torch install path ............... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch'] [default0]:torch version .................... 1.12.0 [default0]:torch cuda version ............... 11.3 [default0]:torch hip version ................ None [default0]:nvcc version ..................... 11.4 [default0]:deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed'] [default0]:deepspeed info ................... 0.7.1+8b2a6371, 8b2a6371, master [default0]:deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3 [default0]:**** Git info for Megatron: git_hash=6c1018f git_branch=mtf-multival **** [default0]:> initializing torch distributed ... [default0]:[2022-09-05 14:30:00,042] [INFO] [comm.py:628:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [default0]:> initializing tensor model parallel with size 1 [default0]:> initializing pipeline model parallel with size 72 [default0]:> setting random seeds to 42 ... [default0]:[2022-09-05 14:30:20,614] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 [default0]:> compiling dataset index builder ... [default0]:make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:make: Nothing to be done for 'default'. [default0]:make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:>>> done with dataset index builder. Compilation time: 0.087 seconds [default0]:> compiling and loading fused kernels ... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module fused_mix_prec_layer_norm_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module fused_mix_prec_layer_norm_cuda... [default0]:>>> done with compiling and loading fused kernels. Compilation time: 7.436 seconds [default0]:time to initialize megatron (seconds): -27.860 [default0]:[after megatron is initialized] datetime: 2022-09-05 14:30:28 [default0]:building GPT model ... [default0]:[2022-09-05 14:30:28,171] [INFO] [utils.py:827:see_memory_usage] Before Building Model [default0]:[2022-09-05 14:30:28,172] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [default0]:[2022-09-05 14:30:28,172] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.08 GB, percent = 7.2% [default0]:SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None [default0]:Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=1, model=0): 5, ProcessCoord(pipe=1, data=2, model=0): 6, ProcessCoord(pipe=1, data=3, model=0): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=1, model=0): 9, ProcessCoord(pipe=2, data=2, model=0): 10, ProcessCoord(pipe=2, data=3, model=0): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=1, model=0): 13, ProcessCoord(pipe=3, data=2, model=0): 14, ProcessCoord(pipe=3, data=3, model=0): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=1, model=0): 17, ProcessCoord(pipe=4, data=2, model=0): 18, ProcessCoord(pipe=4, data=3, model=0): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=1, model=0): 21, ProcessCoord(pipe=5, data=2, model=0): 22, ProcessCoord(pipe=5, data=3, model=0): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=1, model=0): 25, ProcessCoord(pipe=6, data=2, model=0): 26, ProcessCoord(pipe=6, data=3, model=0): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=1, model=0): 29, ProcessCoord(pipe=7, data=2, model=0): 30, ProcessCoord(pipe=7, data=3, model=0): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=1, model=0): 33, ProcessCoord(pipe=8, data=2, model=0): 34, ProcessCoord(pipe=8, data=3, model=0): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=1, model=0): 37, ProcessCoord(pipe=9, data=2, model=0): 38, ProcessCoord(pipe=9, data=3, model=0): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=1, model=0): 41, ProcessCoord(pipe=10, data=2, model=0): 42, ProcessCoord(pipe=10, data=3, model=0): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=1, model=0): 45, ProcessCoord(pipe=11, data=2, model=0): 46, ProcessCoord(pipe=11, data=3, model=0): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=1, model=0): 49, ProcessCoord(pipe=12, data=2, model=0): 50, ProcessCoord(pipe=12, data=3, model=0): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=1, model=0): 53, ProcessCoord(pipe=13, data=2, model=0): 54, ProcessCoord(pipe=13, data=3, model=0): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=1, model=0): 57, ProcessCoord(pipe=14, data=2, model=0): 58, ProcessCoord(pipe=14, data=3, model=0): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=1, model=0): 61, ProcessCoord(pipe=15, data=2, model=0): 62, ProcessCoord(pipe=15, data=3, model=0): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=1, model=0): 65, ProcessCoord(pipe=16, data=2, model=0): 66, ProcessCoord(pipe=16, data=3, model=0): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=1, model=0): 69, ProcessCoord(pipe=17, data=2, model=0): 70, ProcessCoord(pipe=17, data=3, model=0): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=1, model=0): 73, ProcessCoord(pipe=18, data=2, model=0): 74, ProcessCoord(pipe=18, data=3, model=0): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=1, model=0): 77, ProcessCoord(pipe=19, data=2, model=0): 78, ProcessCoord(pipe=19, data=3, model=0): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=1, model=0): 81, ProcessCoord(pipe=20, data=2, model=0): 82, ProcessCoord(pipe=20, data=3, model=0): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=1, model=0): 85, ProcessCoord(pipe=21, data=2, model=0): 86, ProcessCoord(pipe=21, data=3, model=0): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=1, model=0): 89, ProcessCoord(pipe=22, data=2, model=0): 90, ProcessCoord(pipe=22, data=3, model=0): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=1, model=0): 93, ProcessCoord(pipe=23, data=2, model=0): 94, ProcessCoord(pipe=23, data=3, model=0): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=1, model=0): 97, ProcessCoord(pipe=24, data=2, model=0): 98, ProcessCoord(pipe=24, data=3, model=0): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=1, model=0): 101, ProcessCoord(pipe=25, data=2, model=0): 102, ProcessCoord(pipe=25, data=3, model=0): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=1, model=0): 105, ProcessCoord(pipe=26, data=2, model=0): 106, ProcessCoord(pipe=26, data=3, model=0): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=1, model=0): 109, ProcessCoord(pipe=27, data=2, model=0): 110, ProcessCoord(pipe=27, data=3, model=0): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=1, model=0): 113, ProcessCoord(pipe=28, data=2, model=0): 114, ProcessCoord(pipe=28, data=3, model=0): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=1, model=0): 117, ProcessCoord(pipe=29, data=2, model=0): 118, ProcessCoord(pipe=29, data=3, model=0): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=1, model=0): 121, ProcessCoord(pipe=30, data=2, model=0): 122, ProcessCoord(pipe=30, data=3, model=0): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=1, model=0): 125, ProcessCoord(pipe=31, data=2, model=0): 126, ProcessCoord(pipe=31, data=3, model=0): 127, ProcessCoord(pipe=32, data=0, model=0): 128, ProcessCoord(pipe=32, data=1, model=0): 129, ProcessCoord(pipe=32, data=2, model=0): 130, ProcessCoord(pipe=32, data=3, model=0): 131, ProcessCoord(pipe=33, data=0, model=0): 132, ProcessCoord(pipe=33, data=1, model=0): 133, ProcessCoord(pipe=33, data=2, model=0): 134, ProcessCoord(pipe=33, data=3, model=0): 135, ProcessCoord(pipe=34, data=0, model=0): 136, ProcessCoord(pipe=34, data=1, model=0): 137, ProcessCoord(pipe=34, data=2, model=0): 138, ProcessCoord(pipe=34, data=3, model=0): 139, ProcessCoord(pipe=35, data=0, model=0): 140, ProcessCoord(pipe=35, data=1, model=0): 141, ProcessCoord(pipe=35, data=2, model=0): 142, ProcessCoord(pipe=35, data=3, model=0): 143, ProcessCoord(pipe=36, data=0, model=0): 144, ProcessCoord(pipe=36, data=1, model=0): 145, ProcessCoord(pipe=36, data=2, model=0): 146, ProcessCoord(pipe=36, data=3, model=0): 147, ProcessCoord(pipe=37, data=0, model=0): 148, ProcessCoord(pipe=37, data=1, model=0): 149, ProcessCoord(pipe=37, data=2, model=0): 150, ProcessCoord(pipe=37, data=3, model=0): 151, ProcessCoord(pipe=38, data=0, model=0): 152, ProcessCoord(pipe=38, data=1, model=0): 153, ProcessCoord(pipe=38, data=2, model=0): 154, ProcessCoord(pipe=38, data=3, model=0): 155, ProcessCoord(pipe=39, data=0, model=0): 156, ProcessCoord(pipe=39, data=1, model=0): 157, ProcessCoord(pipe=39, data=2, model=0): 158, ProcessCoord(pipe=39, data=3, model=0): 159, ProcessCoord(pipe=40, data=0, model=0): 160, ProcessCoord(pipe=40, data=1, model=0): 161, ProcessCoord(pipe=40, data=2, model=0): 162, ProcessCoord(pipe=40, data=3, model=0): 163, ProcessCoord(pipe=41, data=0, model=0): 164, ProcessCoord(pipe=41, data=1, model=0): 165, ProcessCoord(pipe=41, data=2, model=0): 166, ProcessCoord(pipe=41, data=3, model=0): 167, ProcessCoord(pipe=42, data=0, model=0): 168, ProcessCoord(pipe=42, data=1, model=0): 169, ProcessCoord(pipe=42, data=2, model=0): 170, ProcessCoord(pipe=42, data=3, model=0): 171, ProcessCoord(pipe=43, data=0, model=0): 172, ProcessCoord(pipe=43, data=1, model=0): 173, ProcessCoord(pipe=43, data=2, model=0): 174, ProcessCoord(pipe=43, data=3, model=0): 175, ProcessCoord(pipe=44, data=0, model=0): 176, ProcessCoord(pipe=44, data=1, model=0): 177, ProcessCoord(pipe=44, data=2, model=0): 178, ProcessCoord(pipe=44, data=3, model=0): 179, ProcessCoord(pipe=45, data=0, model=0): 180, ProcessCoord(pipe=45, data=1, model=0): 181, ProcessCoord(pipe=45, data=2, model=0): 182, ProcessCoord(pipe=45, data=3, model=0): 183, ProcessCoord(pipe=46, data=0, model=0): 184, ProcessCoord(pipe=46, data=1, model=0): 185, ProcessCoord(pipe=46, data=2, model=0): 186, ProcessCoord(pipe=46, data=3, model=0): 187, ProcessCoord(pipe=47, data=0, model=0): 188, ProcessCoord(pipe=47, data=1, model=0): 189, ProcessCoord(pipe=47, data=2, model=0): 190, ProcessCoord(pipe=47, data=3, model=0): 191, ProcessCoord(pipe=48, data=0, model=0): 192, ProcessCoord(pipe=48, data=1, model=0): 193, ProcessCoord(pipe=48, data=2, model=0): 194, ProcessCoord(pipe=48, data=3, model=0): 195, ProcessCoord(pipe=49, data=0, model=0): 196, ProcessCoord(pipe=49, data=1, model=0): 197, ProcessCoord(pipe=49, data=2, model=0): 198, ProcessCoord(pipe=49, data=3, model=0): 199, ProcessCoord(pipe=50, data=0, model=0): 200, ProcessCoord(pipe=50, data=1, model=0): 201, ProcessCoord(pipe=50, data=2, model=0): 202, ProcessCoord(pipe=50, data=3, model=0): 203, ProcessCoord(pipe=51, data=0, model=0): 204, ProcessCoord(pipe=51, data=1, model=0): 205, ProcessCoord(pipe=51, data=2, model=0): 206, ProcessCoord(pipe=51, data=3, model=0): 207, ProcessCoord(pipe=52, data=0, model=0): 208, ProcessCoord(pipe=52, data=1, model=0): 209, ProcessCoord(pipe=52, data=2, model=0): 210, ProcessCoord(pipe=52, data=3, model=0): 211, ProcessCoord(pipe=53, data=0, model=0): 212, ProcessCoord(pipe=53, data=1, model=0): 213, ProcessCoord(pipe=53, data=2, model=0): 214, ProcessCoord(pipe=53, data=3, model=0): 215, ProcessCoord(pipe=54, data=0, model=0): 216, ProcessCoord(pipe=54, data=1, model=0): 217, ProcessCoord(pipe=54, data=2, model=0): 218, ProcessCoord(pipe=54, data=3, model=0): 219, ProcessCoord(pipe=55, data=0, model=0): 220, ProcessCoord(pipe=55, data=1, model=0): 221, ProcessCoord(pipe=55, data=2, model=0): 222, ProcessCoord(pipe=55, data=3, model=0): 223, ProcessCoord(pipe=56, data=0, model=0): 224, ProcessCoord(pipe=56, data=1, model=0): 225, ProcessCoord(pipe=56, data=2, model=0): 226, ProcessCoord(pipe=56, data=3, model=0): 227, ProcessCoord(pipe=57, data=0, model=0): 228, ProcessCoord(pipe=57, data=1, model=0): 229, ProcessCoord(pipe=57, data=2, model=0): 230, ProcessCoord(pipe=57, data=3, model=0): 231, ProcessCoord(pipe=58, data=0, model=0): 232, ProcessCoord(pipe=58, data=1, model=0): 233, ProcessCoord(pipe=58, data=2, model=0): 234, ProcessCoord(pipe=58, data=3, model=0): 235, ProcessCoord(pipe=59, data=0, model=0): 236, ProcessCoord(pipe=59, data=1, model=0): 237, ProcessCoord(pipe=59, data=2, model=0): 238, ProcessCoord(pipe=59, data=3, model=0): 239, ProcessCoord(pipe=60, data=0, model=0): 240, ProcessCoord(pipe=60, data=1, model=0): 241, ProcessCoord(pipe=60, data=2, model=0): 242, ProcessCoord(pipe=60, data=3, model=0): 243, ProcessCoord(pipe=61, data=0, model=0): 244, ProcessCoord(pipe=61, data=1, model=0): 245, ProcessCoord(pipe=61, data=2, model=0): 246, ProcessCoord(pipe=61, data=3, model=0): 247, ProcessCoord(pipe=62, data=0, model=0): 248, ProcessCoord(pipe=62, data=1, model=0): 249, ProcessCoord(pipe=62, data=2, model=0): 250, ProcessCoord(pipe=62, data=3, model=0): 251, ProcessCoord(pipe=63, data=0, model=0): 252, ProcessCoord(pipe=63, data=1, model=0): 253, ProcessCoord(pipe=63, data=2, model=0): 254, ProcessCoord(pipe=63, data=3, model=0): 255, ProcessCoord(pipe=64, data=0, model=0): 256, ProcessCoord(pipe=64, data=1, model=0): 257, ProcessCoord(pipe=64, data=2, model=0): 258, ProcessCoord(pipe=64, data=3, model=0): 259, ProcessCoord(pipe=65, data=0, model=0): 260, ProcessCoord(pipe=65, data=1, model=0): 261, ProcessCoord(pipe=65, data=2, model=0): 262, ProcessCoord(pipe=65, data=3, model=0): 263, ProcessCoord(pipe=66, data=0, model=0): 264, ProcessCoord(pipe=66, data=1, model=0): 265, ProcessCoord(pipe=66, data=2, model=0): 266, ProcessCoord(pipe=66, data=3, model=0): 267, ProcessCoord(pipe=67, data=0, model=0): 268, ProcessCoord(pipe=67, data=1, model=0): 269, ProcessCoord(pipe=67, data=2, model=0): 270, ProcessCoord(pipe=67, data=3, model=0): 271, ProcessCoord(pipe=68, data=0, model=0): 272, ProcessCoord(pipe=68, data=1, model=0): 273, ProcessCoord(pipe=68, data=2, model=0): 274, ProcessCoord(pipe=68, data=3, model=0): 275, ProcessCoord(pipe=69, data=0, model=0): 276, ProcessCoord(pipe=69, data=1, model=0): 277, ProcessCoord(pipe=69, data=2, model=0): 278, ProcessCoord(pipe=69, data=3, model=0): 279, ProcessCoord(pipe=70, data=0, model=0): 280, ProcessCoord(pipe=70, data=1, model=0): 281, ProcessCoord(pipe=70, data=2, model=0): 282, ProcessCoord(pipe=70, data=3, model=0): 283, ProcessCoord(pipe=71, data=0, model=0): 284, ProcessCoord(pipe=71, data=1, model=0): 285, ProcessCoord(pipe=71, data=2, model=0): 286, ProcessCoord(pipe=71, data=3, model=0): 287} [default0]:[2022-09-05 14:30:32,043] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer|embedding [default0]:stage=0 layers=3 [default0]: 0: _to_float16 [default0]: 1: EmbeddingPipe [default0]: 2: [default0]:stage=1 layers=1 [default0]: 3: ParallelTransformerLayerPipe [default0]:stage=2 layers=1 [default0]: 4: ParallelTransformerLayerPipe [default0]:stage=3 layers=1 [default0]: 5: ParallelTransformerLayerPipe [default0]:stage=4 layers=1 [default0]: 6: ParallelTransformerLayerPipe [default0]:stage=5 layers=1 [default0]: 7: ParallelTransformerLayerPipe [default0]:stage=6 layers=1 [default0]: 8: ParallelTransformerLayerPipe [default0]:stage=7 layers=1 [default0]: 9: ParallelTransformerLayerPipe [default0]:stage=8 layers=1 [default0]: 10: ParallelTransformerLayerPipe [default0]:stage=9 layers=1 [default0]: 11: ParallelTransformerLayerPipe [default0]:stage=10 layers=1 [default0]: 12: ParallelTransformerLayerPipe [default0]:stage=11 layers=1 [default0]: 13: ParallelTransformerLayerPipe [default0]:stage=12 layers=1 [default0]: 14: ParallelTransformerLayerPipe [default0]:stage=13 layers=1 [default0]: 15: ParallelTransformerLayerPipe [default0]:stage=14 layers=1 [default0]: 16: ParallelTransformerLayerPipe [default0]:stage=15 layers=1 [default0]: 17: ParallelTransformerLayerPipe [default0]:stage=16 layers=1 [default0]: 18: ParallelTransformerLayerPipe [default0]:stage=17 layers=1 [default0]: 19: ParallelTransformerLayerPipe [default0]:stage=18 layers=1 [default0]: 20: ParallelTransformerLayerPipe [default0]:stage=19 layers=1 [default0]: 21: ParallelTransformerLayerPipe [default0]:stage=20 layers=1 [default0]: 22: ParallelTransformerLayerPipe [default0]:stage=21 layers=1 [default0]: 23: ParallelTransformerLayerPipe [default0]:stage=22 layers=1 [default0]: 24: ParallelTransformerLayerPipe [default0]:stage=23 layers=1 [default0]: 25: ParallelTransformerLayerPipe [default0]:stage=24 layers=1 [default0]: 26: ParallelTransformerLayerPipe [default0]:stage=25 layers=1 [default0]: 27: ParallelTransformerLayerPipe [default0]:stage=26 layers=1 [default0]: 28: ParallelTransformerLayerPipe [default0]:stage=27 layers=1 [default0]: 29: ParallelTransformerLayerPipe [default0]:stage=28 layers=1 [default0]: 30: ParallelTransformerLayerPipe [default0]:stage=29 layers=1 [default0]: 31: ParallelTransformerLayerPipe [default0]:stage=30 layers=1 [default0]: 32: ParallelTransformerLayerPipe [default0]:stage=31 layers=1 [default0]: 33: ParallelTransformerLayerPipe [default0]:stage=32 layers=1 [default0]: 34: ParallelTransformerLayerPipe [default0]:stage=33 layers=1 [default0]: 35: ParallelTransformerLayerPipe [default0]:stage=34 layers=1 [default0]: 36: ParallelTransformerLayerPipe [default0]:stage=35 layers=1 [default0]: 37: ParallelTransformerLayerPipe [default0]:stage=36 layers=1 [default0]: 38: ParallelTransformerLayerPipe [default0]:stage=37 layers=1 [default0]: 39: ParallelTransformerLayerPipe [default0]:stage=38 layers=1 [default0]: 40: ParallelTransformerLayerPipe [default0]:stage=39 layers=1 [default0]: 41: ParallelTransformerLayerPipe [default0]:stage=40 layers=1 [default0]: 42: ParallelTransformerLayerPipe [default0]:stage=41 layers=1 [default0]: 43: ParallelTransformerLayerPipe [default0]:stage=42 layers=1 [default0]: 44: ParallelTransformerLayerPipe [default0]:stage=43 layers=1 [default0]: 45: ParallelTransformerLayerPipe [default0]:stage=44 layers=1 [default0]: 46: ParallelTransformerLayerPipe [default0]:stage=45 layers=1 [default0]: 47: ParallelTransformerLayerPipe [default0]:stage=46 layers=1 [default0]: 48: ParallelTransformerLayerPipe [default0]:stage=47 layers=1 [default0]: 49: ParallelTransformerLayerPipe [default0]:stage=48 layers=1 [default0]: 50: ParallelTransformerLayerPipe [default0]:stage=49 layers=1 [default0]: 51: ParallelTransformerLayerPipe [default0]:stage=50 layers=1 [default0]: 52: ParallelTransformerLayerPipe [default0]:stage=51 layers=1 [default0]: 53: ParallelTransformerLayerPipe [default0]:stage=52 layers=1 [default0]: 54: ParallelTransformerLayerPipe [default0]:stage=53 layers=1 [default0]: 55: ParallelTransformerLayerPipe [default0]:stage=54 layers=1 [default0]: 56: ParallelTransformerLayerPipe [default0]:stage=55 layers=1 [default0]: 57: ParallelTransformerLayerPipe [default0]:stage=56 layers=1 [default0]: 58: ParallelTransformerLayerPipe [default0]:stage=57 layers=1 [default0]: 59: ParallelTransformerLayerPipe [default0]:stage=58 layers=1 [default0]: 60: ParallelTransformerLayerPipe [default0]:stage=59 layers=1 [default0]: 61: ParallelTransformerLayerPipe [default0]:stage=60 layers=1 [default0]: 62: ParallelTransformerLayerPipe [default0]:stage=61 layers=1 [default0]: 63: ParallelTransformerLayerPipe [default0]:stage=62 layers=1 [default0]: 64: ParallelTransformerLayerPipe [default0]:stage=63 layers=1 [default0]: 65: ParallelTransformerLayerPipe [default0]:stage=64 layers=1 [default0]: 66: ParallelTransformerLayerPipe [default0]:stage=65 layers=1 [default0]: 67: ParallelTransformerLayerPipe [default0]:stage=66 layers=1 [default0]: 68: ParallelTransformerLayerPipe [default0]:stage=67 layers=1 [default0]: 69: ParallelTransformerLayerPipe [default0]:stage=68 layers=1 [default0]: 70: ParallelTransformerLayerPipe [default0]:stage=69 layers=1 [default0]: 71: ParallelTransformerLayerPipe [default0]:stage=70 layers=3 [default0]: 72: ParallelTransformerLayerPipe [default0]: 73: undo [default0]: 74: MixedFusedLayerNorm [default0]:stage=71 layers=2 [default0]: 75: EmbeddingPipe [default0]: 76: float16_to_fp32 [default0]: loss: CrossEntropy [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default7]:Building extension module utils... [default7]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default7]:ninja: no work to do. [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.32039976119995117 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3204305171966553 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.32039737701416016 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.32040953636169434 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3882179260253906 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0010590553283691406 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006961822509765625 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000682830810546875 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006384849548339844 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006625652313232422 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.12103271484375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.12073016166687012 seconds [default2]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.11656975746154785 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.1167905330657959 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.11644411087036133 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.12465500831604004 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.12908697128295898 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.12956738471984863 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.12941884994506836 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.12084650993347168 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10706806182861328 seconds [default6]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.1053469181060791 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.1196439266204834 seconds [default6]:Time to load utils op: 0.11932253837585449 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10676741600036621 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.11920332908630371 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10622811317443848 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.11961889266967773 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2254190444946289 seconds [default3]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.1152031421661377 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.225419282913208 seconds [default3]:Time to load utils op: 0.11515998840332031 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.11509251594543457 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.12084531784057617 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.1293954849243164 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.1157534122467041 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.12919020652770996 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21923565864562988 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.1234731674194336 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.1231389045715332 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.22542357444763184 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.22540926933288574 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2196516990661621 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21959495544433594 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21956443786621094 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.12908387184143066 seconds [default3]:Loading extension module utils... [default2]:Loading extension module utils... [default6]:Loading extension module utils... [default0]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default0]:Building extension module utils... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21615076065063477 seconds [default4]:Loading extension module utils... [default5]:Loading extension module utils... [default7]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.12020349502563477 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.1192479133605957 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.12937426567077637 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.1233820915222168 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.1233062744140625 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.11926984786987305 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.11923670768737793 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.12603044509887695 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.1256120204925537 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.12603402137756348 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2135465145111084 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21355938911437988 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21546435356140137 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.21557092666625977 seconds [default0]:Loading extension module utils... [default0]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21355390548706055 seconds [default0]:Time to load utils op: 0.2154541015625 seconds [default0]:Time to load utils op: 0.21356582641601562 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.12922930717468262 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2182321548461914 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21823358535766602 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.12545418739318848 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.21821260452270508 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21821069717407227 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21544575691223145 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.22698593139648438 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.11509370803833008 seconds [default3]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.11521792411804199 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.11511874198913574 seconds [default3]:Time to load utils op: 0.11519098281860352 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.1154625415802002 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.11418890953063965 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.11538910865783691 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.11499547958374023 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.11534953117370605 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.11359715461730957 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.1137244701385498 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.11490035057067871 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.12309885025024414 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.12300801277160645 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.12292981147766113 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.1179497241973877 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2140207290649414 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21405649185180664 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21398282051086426 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21400046348571777 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.11841154098510742 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.12366795539855957 seconds [default0]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2271270751953125 seconds [default0]:Time to load utils op: 0.227097749710083 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.22698497772216797 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.12313699722290039 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.11730074882507324 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.6481132507324219 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.6468410491943359 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.12368440628051758 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.12365913391113281 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.12367939949035645 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.6470868587493896 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.1172800064086914 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20261168479919434 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2144927978515625 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21489906311035156 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.11792850494384766 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21391892433166504 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10554933547973633 seconds [default4]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.12462329864501953 seconds [default4]:Time to load utils op: 0.2027738094329834 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.11911487579345703 seconds [default7]:Loading extension module utils... [default2]:Loading extension module utils... [default7]:Time to load utils op: 0.20231890678405762 seconds [default5]:Loading extension module utils... [default2]:Time to load utils op: 0.21382451057434082 seconds [default5]:Time to load utils op: 0.2025747299194336 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10834240913391113 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10785722732543945 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.12492728233337402 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.12497377395629883 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.1184535026550293 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20248889923095703 seconds [default2]:Time to load utils op: 0.11816167831420898 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10402894020080566 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10496735572814941 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.21644282341003418 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20282626152038574 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21645021438598633 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20238113403320312 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20257997512817383 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.1040949821472168 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20258712768554688 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21320843696594238 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10414958000183105 seconds [default0]:Time to load utils op: 0.11736440658569336 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2138671875 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21382856369018555 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.10255789756774902 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.10241031646728516 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.10217094421386719 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.10254359245300293 seconds [default2]:Time to load utils op: 0.10548019409179688 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20259833335876465 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21291446685791016 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21451926231384277 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20260214805603027 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2128133773803711 seconds [default0]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2029275894165039 seconds [default0]:Time to load utils op: 0.20235729217529297 seconds [default2]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2126922607421875 seconds [default2]:Time to load utils op: 0.21453046798706055 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21452736854553223 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21450209617614746 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2023007869720459 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2144930362701416 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21462321281433105 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21444487571716309 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21294116973876953 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2026824951171875 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21644902229309082 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21261382102966309 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20247960090637207 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20174813270568848 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21282696723937988 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21262884140014648 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21282744407653809 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.21263885498046875 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20679330825805664 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21448922157287598 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21382546424865723 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.11157464981079102 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20228862762451172 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20692825317382812 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20690178871154785 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21645593643188477 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20304298400878906 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2025904655456543 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20270729064941406 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2068767547607422 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20229649543762207 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21281790733337402 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2027740478515625 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20255470275878906 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2025008201599121 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2148294448852539 seconds [default3]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20246243476867676 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20283985137939453 seconds [default3]:Time to load utils op: 0.2025618553161621 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20246243476867676 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20286035537719727 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20258045196533203 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20541691780090332 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21480226516723633 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21364068984985352 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2026681900024414 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.11269569396972656 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21048688888549805 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21482563018798828 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21486830711364746 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20785999298095703 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20259523391723633 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3125572204589844 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2027454376220703 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3124544620513916 seconds [default3]:Time to load utils op: 0.11194586753845215 seconds [default6]:Time to load utils op: 0.11821484565734863 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004699230194091797 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21045827865600586 seconds [default2]:Time to load utils op: 0.1115713119506836 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20784521102905273 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.207472562789917 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20747828483581543 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2026810646057129 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20264530181884766 seconds [default4]:Time to load utils op: 0.11823725700378418 seconds [default5]:Time to load utils op: 0.11823081970214844 seconds [default7]:Time to load utils op: 0.11820220947265625 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20251846313476562 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3125150203704834 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20287680625915527 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2025613784790039 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20251059532165527 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20247912406921387 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20271801948547363 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2143535614013672 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2124779224395752 seconds [default5]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2026076316833496 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21464943885803223 seconds [default5]:Time to load utils op: 0.21429729461669922 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2102036476135254 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2027587890625 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20246648788452148 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20989751815795898 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21462512016296387 seconds [default1]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20232200622558594 seconds [default1]:Time to load utils op: 0.2026534080505371 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2025315761566162 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20252537727355957 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2024364471435547 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20256400108337402 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20237016677856445 seconds [default7]:Loading extension module utils... [default4]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2022719383239746 seconds [default4]:Time to load utils op: 0.20244479179382324 seconds [default7]:Time to load utils op: 0.20173025131225586 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20244169235229492 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20540356636047363 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21571612358093262 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2156834602355957 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20604586601257324 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20263290405273438 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21534299850463867 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20238018035888672 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2052931785583496 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21534109115600586 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20251131057739258 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21535181999206543 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21535444259643555 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20221996307373047 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21569085121154785 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2024974822998047 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20253252983093262 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2024524211883545 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.213545560836792 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21570491790771484 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2024691104888916 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21343016624450684 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20273923873901367 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20263457298278809 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20244956016540527 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21349620819091797 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20259737968444824 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20247578620910645 seconds [default4]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20250248908996582 seconds [default4]:Time to load utils op: 0.21347784996032715 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2025163173675537 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2026219367980957 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21444082260131836 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20245909690856934 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20238876342773438 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20263314247131348 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2144479751586914 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20240044593811035 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2144300937652588 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.21443557739257812 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.001672506332397461 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0014369487762451172 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.001483917236328125 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0011816024780273438 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0013129711151123047 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007963180541992188 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0012285709381103516 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007977485656738281 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006778240203857422 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006475448608398438 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0013568401336669922 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00043773651123046875 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0003483295440673828 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006625652313232422 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.000591278076171875 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005931854248046875 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006561279296875 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007171630859375 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000461578369140625 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005900859832763672 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005967617034912109 seconds [default1]:Time to load utils op: 0.0006308555603027344 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005888938903808594 seconds [default3]:Time to load utils op: 0.0004553794860839844 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004134178161621094 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004999637603759766 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004596710205078125 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006403923034667969 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006036758422851562 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default6]:Time to load utils op: 0.0005850791931152344 seconds [default4]:Time to load utils op: 0.0005359649658203125 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005638599395751953 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005753040313720703 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005519390106201172 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00047898292541503906 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006844997406005859 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004138946533203125 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006036758422851562 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007684230804443359 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00033855438232421875 seconds [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.000732421875 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0003409385681152344 seconds [default4]:Time to load utils op: 0.0004687309265136719 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006012916564941406 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00034880638122558594 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007576942443847656 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00067138671875 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007624626159667969 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007441043853759766 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005068778991699219 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00051116943359375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006229877471923828 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004315376281738281 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00043964385986328125 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00066375732421875 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00046706199645996094 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004088878631591797 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004904270172119141 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006086826324462891 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004975795745849609 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004506111145019531 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004494190216064453 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005059242248535156 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00042319297790527344 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004401206970214844 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005922317504882812 seconds [default0]:Time to load utils op: 0.0003819465637207031 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004706382751464844 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006144046783447266 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006635189056396484 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004456043243408203 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00047779083251953125 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0009317398071289062 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005977153778076172 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006606578826904297 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007290840148925781 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005941390991210938 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006744861602783203 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.002816915512084961 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007321834564208984 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0030012130737304688 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004165172576904297 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005433559417724609 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default4]:Time to load utils op: 0.0006191730499267578 seconds [default0]:Time to load utils op: 0.00045228004455566406 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Time to load utils op: 0.0006918907165527344 seconds [default3]:Time to load utils op: 0.0006403923034667969 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00042557716369628906 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00047779083251953125 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0010106563568115234 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004277229309082031 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007789134979248047 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.001003265380859375 seconds [default0]:[2022-09-05 14:30:33,846] [INFO] [utils.py:827:see_memory_usage] After Building Model [default0]:[2022-09-05 14:30:33,847] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-05 14:30:33,847] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.47 GB, percent = 7.2% [default0]:setting training iterations to 3100 [default0]:> learning rate decay style: constant [default0]:DeepSpeed is enabled. [default0]:[2022-09-05 14:30:33,848] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.1+8b2a6371, git-hash=8b2a6371, git-branch=master [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004780292510986328 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0003495216369628906 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.000453948974609375 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008914470672607422 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006692409515380859 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004649162292480469 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0029599666595458984 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.000492095947265625 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.003181934356689453 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004429817199707031 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005090236663818359 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005495548248291016 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004134178161621094 seconds [default5]:Time to load utils op: 0.0007722377777099609 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006415843963623047 seconds [default4]:Time to load utils op: 0.0003933906555175781 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00046706199645996094 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007042884826660156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008351802825927734 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006072521209716797 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0003895759582519531 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008306503295898438 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008921623229980469 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0008294582366943359 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004642009735107422 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0009012222290039062 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008537769317626953 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008714199066162109 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007786750793457031 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008149147033691406 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007877349853515625 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005044937133789062 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005357265472412109 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005841255187988281 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004734992980957031 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0010514259338378906 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006160736083984375 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005033016204833984 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004496574401855469 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004131793975830078 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0003840923309326172 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004086494445800781 seconds [default6]:Time to load utils op: 0.0006189346313476562 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007495880126953125 seconds [default5]:Time to load utils op: 0.0009489059448242188 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007545948028564453 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005617141723632812 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004076957702636719 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005905628204345703 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004668235778808594 seconds [default0]:Time to load utils op: 0.0007996559143066406 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004324913024902344 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0003821849822998047 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007615089416503906 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003790855407714844 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00042891502380371094 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006039142608642578 seconds [default1]:Loading extension module utils... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006346702575683594 seconds [default2]:Time to load utils op: 0.00047326087951660156 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00045013427734375 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00048542022705078125 seconds [default1]:Time to load utils op: 0.0005106925964355469 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004696846008300781 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00048613548278808594 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00045871734619140625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0003993511199951172 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005304813385009766 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004696846008300781 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00043702125549316406 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007104873657226562 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.000423431396484375 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005340576171875 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004677772521972656 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008034706115722656 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007653236389160156 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007801055908203125 seconds [default1]:Time to load utils op: 0.0007224082946777344 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006957054138183594 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006110668182373047 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00042319297790527344 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Time to load utils op: 0.0007913112640380859 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008478164672851562 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006349086761474609 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007822513580322266 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00047469139099121094 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005092620849609375 seconds [default0]:Time to load utils op: 0.0007171630859375 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.000843048095703125 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008008480072021484 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007805824279785156 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005087852478027344 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008537769317626953 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0010678768157958984 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00039124488830566406 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004394054412841797 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005283355712890625 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004916191101074219 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006365776062011719 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008502006530761719 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005295276641845703 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.001129150390625 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006110668182373047 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007472038269042969 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006959438323974609 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007886886596679688 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0003960132598876953 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005407333374023438 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0009479522705078125 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0008754730224609375 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005872249603271484 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007870197296142578 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008056163787841797 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007369518280029297 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005443096160888672 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003921985626220703 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00041866302490234375 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00048160552978515625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004131793975830078 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0011029243469238281 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005338191986083984 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004169940948486328 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.000518798828125 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0014195442199707031 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004222393035888672 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0013747215270996094 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00046944618225097656 seconds [default4]:Time to load utils op: 0.00041413307189941406 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005781650543212891 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00036144256591796875 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0011165142059326172 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004413127899169922 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004210472106933594 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004448890686035156 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006432533264160156 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00042748451232910156 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.000457763671875 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0003483295440673828 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006380081176757812 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007081031799316406 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0007584095001220703 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.000629425048828125 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005710124969482422 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00058746337890625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007834434509277344 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00043463706970214844 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006046295166015625 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0015909671783447266 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0015265941619873047 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0010075569152832031 seconds [default5]:Time to load utils op: 0.0008502006530761719 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007765293121337891 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Time to load utils op: 0.0005514621734619141 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005617141723632812 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004649162292480469 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00043892860412597656 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007412433624267578 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005841255187988281 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.000431060791015625 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default2]:Time to load utils op: 0.0005519390106201172 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006895065307617188 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007772445678710938 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.000911712646484375 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006177425384521484 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007367134094238281 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006303787231445312 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005035400390625 seconds [default0]:Time to load utils op: 0.000637054443359375 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.001598358154296875 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006082057952880859 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006422996520996094 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007429122924804688 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006725788116455078 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006413459777832031 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006458759307861328 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007567405700683594 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00039076805114746094 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0015938282012939453 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0017943382263183594 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0014913082122802734 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0016970634460449219 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0017428398132324219 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-05 14:30:34,578] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [default0]:[2022-09-05 14:30:34,579] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer [default0]:[2022-09-05 14:30:34,579] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer [default0]:[2022-09-05 14:30:34,579] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = {basic_optimizer.__class__.__name__} [default0]:[2022-09-05 14:30:34,579] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer [default0]:[2022-09-05 14:30:34,611] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer [default0]:[2022-09-05 14:30:34,612] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-05 14:30:34,612] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.63 GB, percent = 7.3% [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default4]:Building extension module utils... [default4]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3049798011779785 seconds [default4]:ninja: no work to do. [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2810537815093994 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3049178123474121 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20374536514282227 seconds [default0]:[2022-09-05 14:30:34,846] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30400657653808594 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005209445953369141 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3063778877258301 seconds [default0]:[2022-09-05 14:30:34,846] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-05 14:30:34,846] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.63 GB, percent = 7.3% [default0]:[2022-09-05 14:30:34,904] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 [default0]:[2022-09-05 14:30:34,904] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:30:34,904] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.63 GB, percent = 7.3% [default0]:[2022-09-05 14:30:34,932] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 [default0]:[2022-09-05 14:30:34,933] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:30:34,933] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.63 GB, percent = 7.3% [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30605435371398926 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30618810653686523 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00044035911560058594 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004432201385498047 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0013267993927001953 seconds [default0]:[2022-09-05 14:30:34,961] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 [default0]:[2022-09-05 14:30:34,962] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:30:34,962] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.63 GB, percent = 7.3% [default0]:[2022-09-05 14:30:34,990] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer [default0]:[2022-09-05 14:30:34,990] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:30:34,990] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.63 GB, percent = 7.3% [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004930496215820312 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0014548301696777344 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0016355514526367188 seconds [default0]:[2022-09-05 14:30:35,055] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer [default0]:[2022-09-05 14:30:35,055] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-05 14:30:35,055] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.64 GB, percent = 7.3% [default0]:[2022-09-05 14:30:35,083] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer [default0]:[2022-09-05 14:30:35,084] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-05 14:30:35,084] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.64 GB, percent = 7.3% [default0]:[2022-09-05 14:30:35,084] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [default0]:[2022-09-05 14:30:35,084] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler [default0]:[2022-09-05 14:30:35,084] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [default0]:[2022-09-05 14:30:35,084] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05, 2e-05], mom=[(0.9, 0.95), (0.9, 0.95)] [default0]:[2022-09-05 14:30:35,084] [INFO] [config.py:987:print] DeepSpeedEngine configuration: [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] activation_checkpointing_config { [default0]: "partition_activations": false, [default0]: "contiguous_memory_optimization": false, [default0]: "cpu_checkpointing": false, [default0]: "number_checkpoints": null, [default0]: "synchronize_checkpoint_boundary": false, [default0]: "profile": false [default0]:} [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] amp_enabled .................. False [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] amp_params ................... False [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] autotuning_config ............ { [default0]: "enabled": false, [default0]: "start_step": null, [default0]: "end_step": null, [default0]: "metric_path": null, [default0]: "arg_mappings": null, [default0]: "metric": "throughput", [default0]: "model_info": null, [default0]: "results_dir": null, [default0]: "exps_dir": null, [default0]: "overwrite": true, [default0]: "fast": true, [default0]: "start_profile_step": 3, [default0]: "end_profile_step": 5, [default0]: "tuner_type": "gridsearch", [default0]: "tuner_early_stopping": 5, [default0]: "tuner_num_trials": 50, [default0]: "model_info_path": null, [default0]: "mp_size": 1, [default0]: "max_train_batch_size": null, [default0]: "min_train_batch_size": 1, [default0]: "max_train_micro_batch_size_per_gpu": 1.024000e+03, [default0]: "min_train_micro_batch_size_per_gpu": 1, [default0]: "num_tuning_micro_batch_sizes": 3 [default0]:} [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] bfloat16_enabled ............. True [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] checkpoint_tag_validation_enabled True [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] checkpoint_tag_validation_fail False [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] comms_config ................. [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] communication_data_type ...... None [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] curriculum_enabled ........... False [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] curriculum_params ............ False [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] dataloader_drop_last ......... False [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] disable_allgather ............ False [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] dump_state ................... False [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] dynamic_loss_scale_args ...... None [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] eigenvalue_enabled ........... False [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] eigenvalue_gas_boundary_resolution 1 [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] eigenvalue_layer_name ........ bert.encoder.layer [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] eigenvalue_layer_num ......... 0 [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] eigenvalue_max_iter .......... 100 [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] eigenvalue_stability ......... 1e-06 [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] eigenvalue_tol ............... 0.01 [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] eigenvalue_verbose ........... False [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] elasticity_enabled ........... False [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] flops_profiler_config ........ { [default0]: "enabled": false, [default0]: "profile_step": 1, [default0]: "module_depth": -1, [default0]: "top_modules": 1, [default0]: "detailed": true, [default0]: "output_file": null [default0]:} [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] fp16_auto_cast ............... None [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] fp16_enabled ................. False [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] fp16_master_weights_and_gradients False [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] global_rank .................. 0 [default0]:[2022-09-05 14:30:35,085] [INFO] [config.py:991:print] gradient_accumulation_steps .. 512 [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] gradient_clipping ............ 1.0 [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] gradient_predivide_factor .... 1.0 [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] initial_dynamic_scale ........ 1 [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] load_universal_checkpoint .... True [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] loss_scale ................... 1.0 [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] memory_breakdown ............. False [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] monitor_config ............... [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] nebula_config ................ { [default0]: "enabled": false, [default0]: "persistent_storage_path": null, [default0]: "persistent_time_interval": 100, [default0]: "num_of_version_in_retention": 2, [default0]: "enable_nebula_load": true, [default0]: "load_path": null [default0]:} [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] optimizer_legacy_fusion ...... False [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] optimizer_name ............... None [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] optimizer_params ............. None [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] pld_enabled .................. False [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] pld_params ................... False [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] prescale_gradients ........... False [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] scheduler_name ............... None [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] scheduler_params ............. None [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] sparse_attention ............. None [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] sparse_gradients_enabled ..... False [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] steps_per_print .............. 2000 [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] train_batch_size ............. 2048 [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] train_micro_batch_size_per_gpu 1 [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] wall_clock_breakdown ......... False [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] world_size ................... 4 [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] zero_allow_untested_optimizer False [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] zero_enabled ................. False [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:991:print] zero_optimization_stage ...... 0 [default0]:[2022-09-05 14:30:35,086] [INFO] [config.py:976:print_user_config] json = { [default0]: "train_micro_batch_size_per_gpu": 1, [default0]: "train_batch_size": 2.048000e+03, [default0]: "gradient_clipping": 1.0, [default0]: "zero_optimization": { [default0]: "stage": 0 [default0]: }, [default0]: "bf16": { [default0]: "enabled": true [default0]: }, [default0]: "steps_per_print": 2.000000e+03, [default0]: "wall_clock_breakdown": false, [default0]: "checkpoint": { [default0]: "load_universal": true [default0]: } [default0]:} [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005412101745605469 seconds [default0]:[2022-09-05 14:30:35,087] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=512 micro_batch_size=1 [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=112 STAGE=28 LAYERS=1 [30, 31) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,725] [INFO] [engine.py:145:__init__] RANK=80 STAGE=20 LAYERS=1 [22, 23) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=192 STAGE=48 LAYERS=1 [50, 51) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=196 STAGE=49 LAYERS=1 [51, 52) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=144 STAGE=36 LAYERS=1 [38, 39) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=84 STAGE=21 LAYERS=1 [23, 24) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,725] [INFO] [engine.py:145:__init__] RANK=104 STAGE=26 LAYERS=1 [28, 29) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=240 STAGE=60 LAYERS=1 [62, 63) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,725] [INFO] [engine.py:145:__init__] RANK=168 STAGE=42 LAYERS=1 [44, 45) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=272 STAGE=68 LAYERS=1 [70, 71) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=276 STAGE=69 LAYERS=1 [71, 72) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=16 STAGE=4 LAYERS=1 [6, 7) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=164 STAGE=41 LAYERS=1 [43, 44) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=160 STAGE=40 LAYERS=1 [42, 43) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=172 STAGE=43 LAYERS=1 [45, 46) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=244 STAGE=61 LAYERS=1 [63, 64) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,725] [INFO] [engine.py:145:__init__] RANK=72 STAGE=18 LAYERS=1 [20, 21) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=148 STAGE=37 LAYERS=1 [39, 40) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,725] [INFO] [engine.py:145:__init__] RANK=32 STAGE=8 LAYERS=1 [10, 11) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=20 STAGE=5 LAYERS=1 [7, 8) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=128 STAGE=32 LAYERS=1 [34, 35) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=132 STAGE=33 LAYERS=1 [35, 36) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,725] [INFO] [engine.py:145:__init__] RANK=40 STAGE=10 LAYERS=1 [12, 13) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,725] [INFO] [engine.py:145:__init__] RANK=60 STAGE=15 LAYERS=1 [17, 18) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=152 STAGE=38 LAYERS=1 [40, 41) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=116 STAGE=29 LAYERS=1 [31, 32) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=136 STAGE=34 LAYERS=1 [36, 37) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=284 STAGE=71 LAYERS=2 [75, 77) STAGE_PARAMS=3596615680 (3596.616M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=3 [0, 3) STAGE_PARAMS=3596644352 (3596.644M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=280 STAGE=70 LAYERS=3 [72, 75) STAGE_PARAMS=2466465792 (2466.466M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=224 STAGE=56 LAYERS=1 [58, 59) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=108 STAGE=27 LAYERS=1 [29, 30) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=156 STAGE=39 LAYERS=1 [41, 42) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=96 STAGE=24 LAYERS=1 [26, 27) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=208 STAGE=52 LAYERS=1 [54, 55) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,725] [INFO] [engine.py:145:__init__] RANK=124 STAGE=31 LAYERS=1 [33, 34) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=44 STAGE=11 LAYERS=1 [13, 14) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,725] [INFO] [engine.py:145:__init__] RANK=120 STAGE=30 LAYERS=1 [32, 33) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=212 STAGE=53 LAYERS=1 [55, 56) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=228 STAGE=57 LAYERS=1 [59, 60) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=176 STAGE=44 LAYERS=1 [46, 47) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=140 STAGE=35 LAYERS=1 [37, 38) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=180 STAGE=45 LAYERS=1 [47, 48) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=52 STAGE=13 LAYERS=1 [15, 16) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=48 STAGE=12 LAYERS=1 [14, 15) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=88 STAGE=22 LAYERS=1 [24, 25) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,725] [INFO] [engine.py:145:__init__] RANK=56 STAGE=14 LAYERS=1 [16, 17) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,725] [INFO] [engine.py:145:__init__] RANK=100 STAGE=25 LAYERS=1 [27, 28) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=36 STAGE=9 LAYERS=1 [11, 12) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=28 STAGE=7 LAYERS=1 [9, 10) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=64 STAGE=16 LAYERS=1 [18, 19) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,725] [INFO] [engine.py:145:__init__] RANK=216 STAGE=54 LAYERS=1 [56, 57) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=268 STAGE=67 LAYERS=1 [69, 70) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=264 STAGE=66 LAYERS=1 [68, 69) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,725] [INFO] [engine.py:145:__init__] RANK=24 STAGE=6 LAYERS=1 [8, 9) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=92 STAGE=23 LAYERS=1 [25, 26) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,725] [INFO] [engine.py:145:__init__] RANK=8 STAGE=2 LAYERS=1 [4, 5) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=256 STAGE=64 LAYERS=1 [66, 67) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=76 STAGE=19 LAYERS=1 [21, 22) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=12 STAGE=3 LAYERS=1 [5, 6) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=220 STAGE=55 LAYERS=1 [57, 58) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=236 STAGE=59 LAYERS=1 [61, 62) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=184 STAGE=46 LAYERS=1 [48, 49) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=188 STAGE=47 LAYERS=1 [49, 50) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=232 STAGE=58 LAYERS=1 [60, 61) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=260 STAGE=65 LAYERS=1 [67, 68) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=200 STAGE=50 LAYERS=1 [52, 53) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=248 STAGE=62 LAYERS=1 [64, 65) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=252 STAGE=63 LAYERS=1 [65, 66) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=68 STAGE=17 LAYERS=1 [19, 20) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=4 STAGE=1 LAYERS=1 [3, 4) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:30:35,724] [INFO] [engine.py:145:__init__] RANK=204 STAGE=51 LAYERS=1 [53, 54) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:30:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:30:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:30:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:30:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:30:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:30:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:30:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:30:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:30:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:30:44,357] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 99 [default3]:[2022-09-05 14:30:47,106] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 155 [default7]:[2022-09-05 14:30:47,748] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 231 [default3]:[2022-09-05 14:30:48,057] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 123 [default3]:[2022-09-05 14:30:48,310] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 51 [default6]:[2022-09-05 14:30:48,355] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 158 [default7]:[2022-09-05 14:30:48,369] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 159 [default2]:[2022-09-05 14:30:48,434] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 154 [default3]:[2022-09-05 14:30:48,657] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 251 [default3]:[2022-09-05 14:30:48,768] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 203 [default3]:[2022-09-05 14:30:48,863] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 131 [default7]:[2022-09-05 14:30:48,906] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 207 [default3]:[2022-09-05 14:30:49,094] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 195 [default7]:[2022-09-05 14:30:49,146] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 103 [default6]:[2022-09-05 14:30:49,143] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 102 [default7]:[2022-09-05 14:30:49,321] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 127 [default4]:[2022-09-05 14:30:49,507] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 100 [default5]:[2022-09-05 14:30:49,505] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 101 [default3]:[2022-09-05 14:30:49,766] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 67 [default7]:[2022-09-05 14:30:49,940] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 199 [default3]:[2022-09-05 14:30:50,052] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 243 [default7]:[2022-09-05 14:30:50,051] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 79 [default6]:[2022-09-05 14:30:50,219] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 78 [default3]:[2022-09-05 14:30:50,393] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 147 [default3]:[2022-09-05 14:30:50,412] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 227 [default7]:[2022-09-05 14:30:50,724] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 31 [default3]:[2022-09-05 14:30:50,821] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 179 [default2]:[2022-09-05 14:30:50,825] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 178 [default3]:[2022-09-05 14:30:50,870] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 83 [default2]:[2022-09-05 14:30:50,881] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 50 [default7]:[2022-09-05 14:30:50,877] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 271 [default2]:[2022-09-05 14:30:51,024] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 122 [default7]:[2022-09-05 14:30:51,096] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 151 [default7]:[2022-09-05 14:30:51,126] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 135 [default6]:[2022-09-05 14:30:51,163] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 134 [default6]:[2022-09-05 14:30:51,338] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 262 [default7]:[2022-09-05 14:30:51,332] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 263 [default6]:[2022-09-05 14:30:51,304] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 30 [default2]:[2022-09-05 14:30:51,464] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 282 [default3]:[2022-09-05 14:30:51,402] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 19 [default2]:[2022-09-05 14:30:51,416] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 18 [default3]:[2022-09-05 14:30:51,476] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 283 [default3]:[2022-09-05 14:30:51,528] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 219 [default5]:[2022-09-05 14:30:51,511] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 261 [default4]:[2022-09-05 14:30:51,502] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 260 [default6]:[2022-09-05 14:30:51,768] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 182 [default7]:[2022-09-05 14:30:51,689] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 183 [default3]:[2022-09-05 14:30:51,780] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 275 [default1]:[2022-09-05 14:30:51,833] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 153 [default0]:[2022-09-05 14:30:51,835] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 152 [default1]:[2022-09-05 14:30:51,872] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 145 [default0]:[2022-09-05 14:30:51,873] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 144 [default7]:[2022-09-05 14:30:51,927] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 175 [default5]:[2022-09-05 14:30:51,992] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 157 [default2]:[2022-09-05 14:30:52,088] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 274 [default3]:[2022-09-05 14:30:52,128] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 163 [default7]:[2022-09-05 14:30:52,113] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 247 [default4]:[2022-09-05 14:30:52,091] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 148 [default7]:[2022-09-05 14:30:52,092] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 167 [default6]:[2022-09-05 14:30:52,068] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 166 [default7]:[2022-09-05 14:30:52,154] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 119 [default5]:[2022-09-05 14:30:52,087] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 149 [default4]:[2022-09-05 14:30:52,094] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 132 [default6]:[2022-09-05 14:30:52,158] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 118 [default5]:[2022-09-05 14:30:52,087] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 133 [default6]:[2022-09-05 14:30:52,129] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 246 [default2]:[2022-09-05 14:30:52,252] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 82 [default3]:[2022-09-05 14:30:52,256] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 43 [default6]:[2022-09-05 14:30:52,218] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 222 [default7]:[2022-09-05 14:30:52,296] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 55 [default4]:[2022-09-05 14:30:52,381] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 156 [default4]:[2022-09-05 14:30:52,528] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 20 [default5]:[2022-09-05 14:30:52,525] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 21 [default5]:[2022-09-05 14:30:52,548] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 221 [default3]:[2022-09-05 14:30:52,661] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 115 [default0]:[2022-09-05 14:30:52,568] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 96 [default3]:[2022-09-05 14:30:52,586] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 11 [default2]:[2022-09-05 14:30:52,582] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 10 [default4]:[2022-09-05 14:30:52,617] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 204 [default5]:[2022-09-05 14:30:52,627] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 205 [default2]:[2022-09-05 14:30:52,663] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 114 [default2]:[2022-09-05 14:30:52,685] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 106 [default3]:[2022-09-05 14:30:52,671] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 107 [default7]:[2022-09-05 14:30:52,762] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 223 [default7]:[2022-09-05 14:30:52,713] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 239 [default6]:[2022-09-05 14:30:52,861] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 174 [default5]:[2022-09-05 14:30:52,851] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 117 [default6]:[2022-09-05 14:30:52,803] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 54 [default3]:[2022-09-05 14:30:52,861] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 91 [default6]:[2022-09-05 14:30:52,826] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 94 [default6]:[2022-09-05 14:30:52,825] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 270 [default6]:[2022-09-05 14:30:52,852] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 238 [default3]:[2022-09-05 14:30:52,868] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 235 [default2]:[2022-09-05 14:30:52,832] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 202 [default1]:[2022-09-05 14:30:52,928] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 121 [default0]:[2022-09-05 14:30:52,927] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 120 [default1]:[2022-09-05 14:30:52,928] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 97 [default7]:[2022-09-05 14:30:52,915] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 95 [default7]:[2022-09-05 14:30:52,915] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 215 [default4]:[2022-09-05 14:30:53,048] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 244 [default5]:[2022-09-05 14:30:53,057] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 245 [default2]:[2022-09-05 14:30:53,019] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 146 [default4]:[2022-09-05 14:30:53,012] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 212 [default7]:[2022-09-05 14:30:52,998] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 15 [default5]:[2022-09-05 14:30:53,013] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 213 [default6]:[2022-09-05 14:30:53,163] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 22 [default6]:[2022-09-05 14:30:53,131] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 230 [default3]:[2022-09-05 14:30:53,130] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 211 [default7]:[2022-09-05 14:30:53,168] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 23 [default2]:[2022-09-05 14:30:53,148] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 90 [default5]:[2022-09-05 14:30:53,133] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 181 [default4]:[2022-09-05 14:30:53,136] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 180 [default7]:[2022-09-05 14:30:53,172] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 255 [default7]:[2022-09-05 14:30:53,198] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 111 [default6]:[2022-09-05 14:30:53,229] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 46 [default2]:[2022-09-05 14:30:53,260] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 42 [default6]:[2022-09-05 14:30:53,209] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 110 [default6]:[2022-09-05 14:30:53,221] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 150 [default4]:[2022-09-05 14:30:53,185] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 124 [default7]:[2022-09-05 14:30:53,210] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 47 [default5]:[2022-09-05 14:30:53,186] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 125 [default2]:[2022-09-05 14:30:53,199] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 98 [default3]:[2022-09-05 14:30:53,268] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 259 [default2]:[2022-09-05 14:30:53,263] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 258 [default7]:[2022-09-05 14:30:53,208] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 71 [default6]:[2022-09-05 14:30:53,208] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 70 [default7]:[2022-09-05 14:30:53,358] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 87 [default6]:[2022-09-05 14:30:53,315] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 198 [default7]:[2022-09-05 14:30:53,329] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 191 [default6]:[2022-09-05 14:30:53,374] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 86 [default3]:[2022-09-05 14:30:53,452] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 171 [default1]:[2022-09-05 14:30:53,447] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 225 [default4]:[2022-09-05 14:30:53,459] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 116 [default0]:[2022-09-05 14:30:53,444] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 224 [default7]:[2022-09-05 14:30:53,528] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 63 [default6]:[2022-09-05 14:30:53,530] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 62 [default7]:[2022-09-05 14:30:53,524] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 143 [default6]:[2022-09-05 14:30:53,526] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 142 [default0]:[2022-09-05 14:30:53,526] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 128 [default2]:[2022-09-05 14:30:53,536] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 66 [default5]:[2022-09-05 14:30:53,563] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 69 [default4]:[2022-09-05 14:30:53,557] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 68 [default6]:[2022-09-05 14:30:53,534] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 6 [default7]:[2022-09-05 14:30:53,552] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 7 [default1]:[2022-09-05 14:30:53,631] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 161 [default0]:[2022-09-05 14:30:53,626] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 160 [default0]:[2022-09-05 14:30:53,694] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 192 [default5]:[2022-09-05 14:30:53,741] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 109 [default4]:[2022-09-05 14:30:53,669] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 276 [default2]:[2022-09-05 14:30:53,681] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 194 [default5]:[2022-09-05 14:30:53,677] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 277 [default1]:[2022-09-05 14:30:53,749] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 209 [default3]:[2022-09-05 14:30:53,753] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 139 [default5]:[2022-09-05 14:30:53,747] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 45 [default4]:[2022-09-05 14:30:53,749] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 44 [default0]:[2022-09-05 14:30:53,745] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 208 [default2]:[2022-09-05 14:30:53,751] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 138 [default5]:[2022-09-05 14:30:53,749] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 229 [default1]:[2022-09-05 14:30:53,698] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 193 [default2]:[2022-09-05 14:30:53,713] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 266 [default4]:[2022-09-05 14:30:53,704] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 220 [default1]:[2022-09-05 14:30:53,740] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 65 [default1]:[2022-09-05 14:30:53,776] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 105 [default2]:[2022-09-05 14:30:53,791] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 242 [default5]:[2022-09-05 14:30:53,817] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 61 [default4]:[2022-09-05 14:30:53,859] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 60 [default4]:[2022-09-05 14:30:53,776] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 108 [default2]:[2022-09-05 14:30:53,836] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 26 [default1]:[2022-09-05 14:30:53,827] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 89 [default0]:[2022-09-05 14:30:53,930] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 112 [default6]:[2022-09-05 14:30:53,896] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 278 [default1]:[2022-09-05 14:30:53,939] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 273 [default2]:[2022-09-05 14:30:53,947] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 210 [default1]:[2022-09-05 14:30:53,923] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 129 [default1]:[2022-09-05 14:30:53,942] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 217 [default6]:[2022-09-05 14:30:53,871] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 126 [default0]:[2022-09-05 14:30:53,888] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 48 [default1]:[2022-09-05 14:30:53,889] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 49 [default4]:[2022-09-05 14:30:53,917] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 76 [default0]:[2022-09-05 14:30:53,937] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 8 [default1]:[2022-09-05 14:30:53,936] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 9 [default0]:[2022-09-05 14:30:53,940] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 216 [default0]:[2022-09-05 14:30:53,972] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 232 [default1]:[2022-09-05 14:30:53,973] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 233 [default3]:[2022-09-05 14:30:53,936] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 187 [default0]:[2022-09-05 14:30:53,937] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 248 [default1]:[2022-09-05 14:30:53,937] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 249 [default5]:[2022-09-05 14:30:53,929] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 29 [default0]:[2022-09-05 14:30:53,987] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 272 [default7]:[2022-09-05 14:30:53,972] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 279 [default0]:[2022-09-05 14:30:53,989] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 240 [default0]:[2022-09-05 14:30:54,001] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 40 [default1]:[2022-09-05 14:30:54,010] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 41 [default4]:[2022-09-05 14:30:54,035] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 172 [default2]:[2022-09-05 14:30:54,043] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 34 [default5]:[2022-09-05 14:30:54,039] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 173 [default1]:[2022-09-05 14:30:53,989] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 241 [default0]:[2022-09-05 14:30:53,985] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 280 [default0]:[2022-09-05 14:30:54,042] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 176 [default1]:[2022-09-05 14:30:54,037] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 177 [default4]:[2022-09-05 14:30:53,985] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 268 [default5]:[2022-09-05 14:30:53,988] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 269 [default6]:[2022-09-05 14:30:53,995] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 214 [default0]:[2022-09-05 14:30:53,993] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 200 [default5]:[2022-09-05 14:30:54,075] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 197 [default4]:[2022-09-05 14:30:54,071] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 196 [default0]:[2022-09-05 14:30:54,108] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 104 [default5]:[2022-09-05 14:30:54,099] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 77 [default0]:[2022-09-05 14:30:54,142] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 16 [default1]:[2022-09-05 14:30:54,144] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 17 [default0]:[2022-09-05 14:30:54,076] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 136 [default4]:[2022-09-05 14:30:54,160] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 28 [default4]:[2022-09-05 14:30:54,110] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 252 [default5]:[2022-09-05 14:30:54,111] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 253 [default2]:[2022-09-05 14:30:54,208] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 162 [default0]:[2022-09-05 14:30:54,238] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 80 [default7]:[2022-09-05 14:30:54,223] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 39 [default1]:[2022-09-05 14:30:54,169] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 281 [default5]:[2022-09-05 14:30:54,233] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 141 [default4]:[2022-09-05 14:30:54,231] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 140 [default6]:[2022-09-05 14:30:54,252] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 14 [default3]:[2022-09-05 14:30:54,207] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 27 [default2]:[2022-09-05 14:30:54,273] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 218 [default2]:[2022-09-05 14:30:54,198] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 58 [default3]:[2022-09-05 14:30:54,188] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 59 [default0]:[2022-09-05 14:30:54,180] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 88 [default6]:[2022-09-05 14:30:54,198] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 254 [default2]:[2022-09-05 14:30:54,302] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 130 [default1]:[2022-09-05 14:30:54,354] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 137 [default5]:[2022-09-05 14:30:54,274] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 53 [default4]:[2022-09-05 14:30:54,276] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 52 [default3]:[2022-09-05 14:30:54,316] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 267 [default0]:[2022-09-05 14:30:54,287] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 64 [default4]:[2022-09-05 14:30:54,346] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 236 [default2]:[2022-09-05 14:30:54,337] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 250 [default5]:[2022-09-05 14:30:54,349] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 237 [default2]:[2022-09-05 14:30:54,315] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 234 [default3]:[2022-09-05 14:30:54,422] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 35 [default6]:[2022-09-05 14:30:54,452] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 38 [default3]:[2022-09-05 14:30:54,460] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 75 [default1]:[2022-09-05 14:30:54,452] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 113 [default0]:[2022-09-05 14:30:54,453] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 256 [default1]:[2022-09-05 14:30:54,397] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 25 [default0]:[2022-09-05 14:30:54,401] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 24 [default4]:[2022-09-05 14:30:54,416] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 92 [default1]:[2022-09-05 14:30:54,454] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 257 [default1]:[2022-09-05 14:30:54,495] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 33 [default2]:[2022-09-05 14:30:54,488] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 74 [default0]:[2022-09-05 14:30:54,486] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 72 [default4]:[2022-09-05 14:30:54,494] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 228 [default2]:[2022-09-05 14:30:54,549] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 226 [default0]:[2022-09-05 14:30:54,538] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 56 [default1]:[2022-09-05 14:30:54,513] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 265 [default6]:[2022-09-05 14:30:54,483] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 206 [default1]:[2022-09-05 14:30:54,541] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 201 [default2]:[2022-09-05 14:30:54,590] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 170 [default4]:[2022-09-05 14:30:54,647] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 12 [default2]:[2022-09-05 14:30:54,586] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 186 [default5]:[2022-09-05 14:30:54,688] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 85 [default4]:[2022-09-05 14:30:54,684] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 84 [default0]:[2022-09-05 14:30:54,734] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 168 [default1]:[2022-09-05 14:30:54,732] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 169 [default1]:[2022-09-05 14:30:54,748] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 73 [default5]:[2022-09-05 14:30:54,725] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 93 [default5]:[2022-09-05 14:30:54,724] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 189 [default1]:[2022-09-05 14:30:54,819] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 81 [default0]:[2022-09-05 14:30:54,787] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 32 [default0]:[2022-09-05 14:30:54,851] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 184 [default1]:[2022-09-05 14:30:55,052] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 57 [default0]:[2022-09-05 14:30:55,060] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 264 [default6]:[2022-09-05 14:30:55,010] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 190 [default5]:[2022-09-05 14:30:55,014] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 5 [default4]:[2022-09-05 14:30:55,096] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 164 [default5]:[2022-09-05 14:30:55,097] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 165 [default4]:[2022-09-05 14:30:55,162] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 36 [default4]:[2022-09-05 14:30:55,139] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 188 [default5]:[2022-09-05 14:30:55,369] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 13 [default5]:[2022-09-05 14:30:55,436] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 37 [default1]:[2022-09-05 14:30:55,418] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 185 [default4]:[2022-09-05 14:30:55,825] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 4 [default3]:[2022-09-05 14:30:59,115] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 3 [default7]:[2022-09-05 14:30:59,214] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 287 [default5]:[2022-09-05 14:31:01,103] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 285 [default2]:[2022-09-05 14:31:01,274] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 2 [default1]:[2022-09-05 14:31:01,354] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 1 [default4]:[2022-09-05 14:31:03,089] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 284 [default0]:[2022-09-05 14:31:04,317] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 0 [default0]: checkpoint version 3.0 [default7]:time (ms) | load-checkpoint: 29111.53 [default0]: successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq at iteration 95000 [default6]:[2022-09-05 14:31:05,710] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 286 [default0]:/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/utils.py:365: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings [default0]: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") [default0]:estimated model parameters: 258.958393344 [default0]:estimated model parameters without embeddings: 0.002064384 [default0]:[after model, optimizer, and learning rate scheduler are built] datetime: 2022-09-05 14:31:05 [default0]:> building train, validation, and test datasets ... [default0]: > datasets target sizes (minimum size): [default0]: train: 6348800 [default0]: validation: 266240 [default0]: test: 20480 [default0]:> building train, validation, and test datasets for T0 ... [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.055073 seconds [default0]: number of documents: 90897616 [default0]: > dataset split: [default0]: train: [default0]: document indices in [0, 90897616) total of 90897616 documents [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.030988 seconds [default0]: number of documents: 90897616 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003826 seconds [default0]: number of documents: 90897616 [default0]: > loading doc-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_batch_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_shuffle_idx.npy [default0]: loaded indexed file in 0.047 seconds [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.206310 seconds [default0]: number of documents: 15234080 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [14472376, 15234080) total of 761704 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_8848ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_8848ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_8848ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.103 seconds [default0]: total number of samples: 221750 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.128464 seconds [default0]: number of documents: 6142390 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [5835270, 6142390) total of 307120 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_3009ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_3009ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_3009ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.064 seconds [default0]: total number of samples: 136143 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.180382 seconds [default0]: number of documents: 26176998 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [24868148, 26176998) total of 1308850 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_34858ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_34858ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_34858ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.048 seconds [default0]: total number of samples: 432311 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.149222 seconds [default0]: number of documents: 20844665 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [19802432, 20844665) total of 1042233 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_59324ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_59324ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_59324ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.040 seconds [default0]: total number of samples: 521545 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.129514 seconds [default0]: number of documents: 67005817 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [63655526, 67005817) total of 3350291 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_28545ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_28545ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_28545ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.176 seconds [default0]: total number of samples: 1740321 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.237798 seconds [default0]: number of documents: 5149795 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4892305, 5149795) total of 257490 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_418ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_418ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_418ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.023 seconds [default0]: total number of samples: 26370 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.114679 seconds [default0]: number of documents: 58847091 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [55904736, 58847091) total of 2942355 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_34929ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_34929ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_34929ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.202 seconds [default0]: total number of samples: 1458654 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.165318 seconds [default0]: number of documents: 12514253 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11888540, 12514253) total of 625713 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_2922ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_2922ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_2922ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.147 seconds [default0]: total number of samples: 134071 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.072181 seconds [default0]: number of documents: 180608 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [171578, 180608) total of 9030 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_30ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_30ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_30ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.006 seconds [default0]: total number of samples: 2501 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.270099 seconds [default0]: number of documents: 12303134 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11687977, 12303134) total of 615157 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_1470ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_1470ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_1470ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.048 seconds [default0]: total number of samples: 157244 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.122219 seconds [default0]: number of documents: 2033057 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1931404, 2033057) total of 101653 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_108ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_108ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_108ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.014 seconds [default0]: total number of samples: 20517 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.162972 seconds [default0]: number of documents: 26793553 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [25453875, 26793553) total of 1339678 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1999ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1999ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1999ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.041 seconds [default0]: total number of samples: 101502 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.361526 seconds [default0]: number of documents: 3155990 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2998190, 3155990) total of 157800 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_166ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_166ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_166ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.044 seconds [default0]: total number of samples: 44182 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.100491 seconds [default0]: number of documents: 6692522 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [6357896, 6692522) total of 334626 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_277ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_277ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_277ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.013 seconds [default0]: total number of samples: 47613 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.414640 seconds [default0]: number of documents: 3017261 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2866398, 3017261) total of 150863 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_135ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_135ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_135ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.026 seconds [default0]: total number of samples: 29298 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.432908 seconds [default0]: number of documents: 3648041 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [3465639, 3648041) total of 182402 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_179ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_179ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_179ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.017 seconds [default0]: total number of samples: 5659 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.096834 seconds [default0]: number of documents: 4327282 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4110918, 4327282) total of 216364 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_97ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_97ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_97ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.036 seconds [default0]: total number of samples: 12423 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.301389 seconds [default0]: number of documents: 2698896 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2563951, 2698896) total of 134945 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_137ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_137ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_137ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.091 seconds [default0]: total number of samples: 19133 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.183908 seconds [default0]: number of documents: 12767593 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [12129213, 12767593) total of 638380 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_566ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_566ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_566ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.065 seconds [default0]: total number of samples: 87928 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.128002 seconds [default0]: number of documents: 4342323 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4125207, 4342323) total of 217116 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_245ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_245ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_245ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.053 seconds [default0]: total number of samples: 69780 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.404372 seconds [default0]: number of documents: 3022722 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2871586, 3022722) total of 151136 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_334ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_334ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_334ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.042 seconds [default0]: total number of samples: 22532 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.140436 seconds [default0]: number of documents: 1162568 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1104440, 1162568) total of 58128 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_85ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_85ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_85ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.011 seconds [default0]: total number of samples: 1608 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.221362 seconds [default0]: number of documents: 55294645 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [52529913, 55294645) total of 2764732 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_21773ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_21773ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_21773ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.173 seconds [default0]: total number of samples: 690621 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.384838 seconds [default0]: number of documents: 44855616 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [42612835, 44855616) total of 2242781 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_14796ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_14796ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_14796ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.102 seconds [default0]: total number of samples: 468689 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.308195 seconds [default0]: number of documents: 31969891 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [30371396, 31969891) total of 1598495 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_13256ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_13256ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_13256ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.122 seconds [default0]: total number of samples: 497625 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.125814 seconds [default0]: number of documents: 34110375 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [32404856, 34110375) total of 1705519 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_6587ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_6587ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_6587ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.101 seconds [default0]: total number of samples: 125120 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.061324 seconds [default0]: number of documents: 43761623 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [41573542, 43761623) total of 2188081 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_32355ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_32355ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_32355ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.238 seconds [default0]: total number of samples: 1010592 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.023844 seconds [default0]: number of documents: 197602 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [187722, 197602) total of 9880 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.018 seconds [default0]: total number of samples: 4451 [default0]: total number of epochs: 1 [default0]:> building indices for blendable datasets ... [default0]: > sample ratios: [default0]: dataset 0, input: 0.0330676, achieved: 0.0330676 [default0]: dataset 1, input: 0.0112421, achieved: 0.0112421 [default0]: dataset 2, input: 0.130272, achieved: 0.130272 [default0]: dataset 3, input: 0.221712, achieved: 0.221712 [default0]: dataset 4, input: 0.106678, achieved: 0.106678 [default0]: dataset 5, input: 0.00155951, achieved: 0.00155955 [default0]: dataset 6, input: 0.13054, achieved: 0.13054 [default0]: dataset 7, input: 0.010918, achieved: 0.0109181 [default0]: dataset 8, input: 0.000110214, achieved: 0.000110257 [default0]: dataset 9, input: 0.00549238, achieved: 0.00549235 [default0]: dataset 10, input: 0.000402122, achieved: 0.000402094 [default0]: dataset 11, input: 0.00747007, achieved: 0.00747007 [default0]: dataset 12, input: 0.000619047, achieved: 0.000619024 [default0]: dataset 13, input: 0.00103353, achieved: 0.0010336 [default0]: dataset 14, input: 0.000501201, achieved: 0.000501226 [default0]: dataset 15, input: 0.000667277, achieved: 0.000667231 [default0]: dataset 16, input: 0.000359281, achieved: 0.000359326 [default0]: dataset 17, input: 0.000508443, achieved: 0.000508519 [default0]: dataset 18, input: 0.00211373, achieved: 0.0021138 [default0]: dataset 19, input: 0.000912995, achieved: 0.000912961 [default0]: dataset 20, input: 0.00124543, achieved: 0.00124546 [default0]: dataset 21, input: 0.000315887, achieved: 0.00031594 [default0]: dataset 22, input: 0.0813721, achieved: 0.0813721 [default0]: dataset 23, input: 0.0552939, achieved: 0.0552939 [default0]: dataset 24, input: 0.0495415, achieved: 0.0495414 [default0]: dataset 25, input: 0.0246164, achieved: 0.0246163 [default0]: dataset 26, input: 0.120917, achieved: 0.120917 [default0]: dataset 27, input: 0.000517703, achieved: 0.000517666 [default0]:> elapsed time for building blendable dataset indices: 0.33 (sec) [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.015865 seconds [default0]: number of documents: 2940097 [default0]: > dataset split: [default0]: valid: [default0]: document indices in [0, 2940097) total of 2940097 documents [default0]: > building dataset index ... [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.027263 seconds [default0]: number of documents: 2940097 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.003366 seconds [default0]: number of documents: 2940097 [default0]: > loading doc-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation_valid_indexmap_266240ns_42s_decoder_packed_batch_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation_valid_indexmap_266240ns_42s_decoder_packed_shuffle_idx.npy [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default1]: return f(*args, **kwargs) [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default3]:TMP RESETTING CONSSAMPLES [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default0]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default1]:Traceback (most recent call last): [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default1]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default0]: return f(*args, **kwargs) [default3]: main() [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default0]:TMP RESETTING CONSSAMPLES [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default2]: batch_sampler = MegatronPretrainingSampler( [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default2]:Traceback (most recent call last): [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default4]: assert self.consumed_samples < self.total_samples, \ [default2]: main() [default3]:Traceback (most recent call last): [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]:TMP RESETTING CONSSAMPLES [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default7]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:Traceback (most recent call last): [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default1]: assert self.consumed_samples < self.total_samples, \ [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: main() [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:Traceback (most recent call last): [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default1]:TMP RESETTING CONSSAMPLES [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: pretrain( [default4]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: main() [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default2]: batch_sampler = MegatronPretrainingSampler( [default0]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]: return f(*args, **kwargs) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default4]: batch_sampler = MegatronPretrainingSampler( [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default2]:Traceback (most recent call last): [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:TMP RESETTING CONSSAMPLES [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: main() [default2]: return f(*args, **kwargs) [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: main() [default4]: return f(*args, **kwargs) [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default1]: return f(*args, **kwargs) [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: assert self.consumed_samples < self.total_samples, \ [default1]: pretrain( [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]:TMP RESETTING CONSSAMPLES [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default3]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]:TMP RESETTING CONSSAMPLES [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: batch_sampler = MegatronPretrainingSampler( [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: return f(*args, **kwargs) [default1]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default0]: loaded indexed file in 0.073 seconds [default0]:> finished creating T0 datasets ... [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]:TMP RESETTING CONSSAMPLES [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: main() [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default2]: pretrain( [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]:TMP RESETTING CONSSAMPLES [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: batch_sampler = MegatronPretrainingSampler( [default2]: assert self.consumed_samples < self.total_samples, \ [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]:Traceback (most recent call last): [default3]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: batch_sampler = MegatronPretrainingSampler( [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:Traceback (most recent call last): [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: train_dataloader = build_pretraining_data_loader( [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: batch_sampler = MegatronPretrainingSampler( [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: assert self.consumed_samples < self.total_samples, \ [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default3]:TMP RESETTING CONSSAMPLES [default2]:Traceback (most recent call last): [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]:Traceback (most recent call last): [default4]:Traceback (most recent call last): [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default0]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default5]:TMP RESETTING CONSSAMPLES [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: main() [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: pretrain( [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]: main() [default3]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default1]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default0]:Traceback (most recent call last): [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default2]: return f(*args, **kwargs) [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]:TMP RESETTING CONSSAMPLES [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: main() [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: return f(*args, **kwargs) [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]:Traceback (most recent call last): [default3]: main() [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:TMP RESETTING CONSSAMPLES [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: batch_sampler = MegatronPretrainingSampler( [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default6]:TMP RESETTING CONSSAMPLES [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default3]: assert self.consumed_samples < self.total_samples, \ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: return f(*args, **kwargs) [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default0]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default2]: main() [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:TMP RESETTING CONSSAMPLES [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default2]:TMP RESETTING CONSSAMPLES [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: pretrain( [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default1]: main() [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default3]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default1]: pretrain( [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]:Traceback (most recent call last): [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: batch_sampler = MegatronPretrainingSampler( [default1]: assert self.consumed_samples < self.total_samples, \ [default2]: return f(*args, **kwargs) [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: main() [default2]: pretrain( [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default0]: return f(*args, **kwargs) [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]:Traceback (most recent call last): [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]:TMP RESETTING CONSSAMPLES [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]: return f(*args, **kwargs) [default3]:Traceback (most recent call last): [default3]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]: main() [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default3]: assert self.consumed_samples < self.total_samples, \ [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:Traceback (most recent call last): [default2]:TMP RESETTING CONSSAMPLES [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]: main() [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]:TMP RESETTING CONSSAMPLES [default4]:TMP RESETTING CONSSAMPLES [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: return f(*args, **kwargs) [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default5]: batch_sampler = MegatronPretrainingSampler( [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]:TMP RESETTING CONSSAMPLES [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]: pretrain( [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: assert self.consumed_samples < self.total_samples, \ [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: train_dataloader = build_pretraining_data_loader( [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default2]:Traceback (most recent call last): [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]:TMP RESETTING CONSSAMPLES [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default4]:Traceback (most recent call last): [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default6]:TMP RESETTING CONSSAMPLES [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default1]:TMP RESETTING CONSSAMPLES [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default2]:Traceback (most recent call last): [default1]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default4]: return f(*args, **kwargs) [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:Traceback (most recent call last): [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]: train_dataloader = build_pretraining_data_loader( [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default1]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default0]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: pretrain( [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: train_dataloader = build_pretraining_data_loader( [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: return f(*args, **kwargs) [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: assert self.consumed_samples < self.total_samples, \ [default2]: assert self.consumed_samples < self.total_samples, \ [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_dataloader = build_pretraining_data_loader( [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default7]:Traceback (most recent call last): [default5]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: pretrain( [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default5]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:TMP RESETTING CONSSAMPLES [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:Traceback (most recent call last): [default1]:Traceback (most recent call last): [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default1]: return f(*args, **kwargs) [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: pretrain( [default1]: batch_sampler = MegatronPretrainingSampler( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:Traceback (most recent call last): [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default2]:GOTCONSUMEDSAMPLES 178402224 5030720 [default2]:TMP RESETTING CONSSAMPLES [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default1]: main() [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:Traceback (most recent call last): [default1]:TMP RESETTING CONSSAMPLES [default1]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default1]: return f(*args, **kwargs) [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default1]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default1]: pretrain( [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default2]: assert self.consumed_samples < self.total_samples, \ [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: train_dataloader = build_pretraining_data_loader( [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default6]: pretrain( [default1]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]: assert self.consumed_samples < self.total_samples, \ [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default3]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default0]:Traceback (most recent call last): [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]:Traceback (most recent call last): [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default2]: main() [default2]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default2]: return f(*args, **kwargs) [default2]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:Traceback (most recent call last): [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default3]: main() [default3]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default3]: return f(*args, **kwargs) [default3]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default3]: pretrain( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default3]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default3]: train_dataloader = build_pretraining_data_loader( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default3]: batch_sampler = MegatronPretrainingSampler( [default3]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default3]: assert self.consumed_samples < self.total_samples, \ [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:Traceback (most recent call last): [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default0]: main() [default0]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default0]: return f(*args, **kwargs) [default0]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default0]: pretrain( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default1]: batch_sampler = MegatronPretrainingSampler( [default3]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default0]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: train_dataloader = build_pretraining_data_loader( [default1]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default4]: main() [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default2]: assert self.consumed_samples < self.total_samples, \ [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default1]: assert self.consumed_samples < self.total_samples, \ [default1]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default4]:GOTCONSUMEDSAMPLES 178402224 5030720 [default4]:TMP RESETTING CONSSAMPLES [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default7]:GOTCONSUMEDSAMPLES 178402224 5030720 [default7]:TMP RESETTING CONSSAMPLES [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default0]: batch_sampler = MegatronPretrainingSampler( [default0]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default0]: assert self.consumed_samples < self.total_samples, \ [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default6]:GOTCONSUMEDSAMPLES 178402224 5030720 [default6]:TMP RESETTING CONSSAMPLES [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:GOTCONSUMEDSAMPLES 178402224 5030720 [default5]:TMP RESETTING CONSSAMPLES [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default0]:AssertionError: no samples left to consume: 165854565, 12547659 [default2]: pretrain( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default2]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default2]: train_dataloader = build_pretraining_data_loader( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default2]: batch_sampler = MegatronPretrainingSampler( [default2]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default2]: assert self.consumed_samples < self.total_samples, \ [default2]:AssertionError: no samples left to consume: 165854565, 12547659 [default3]:GOTCONSUMEDSAMPLES 178402224 5030720 [default3]:TMP RESETTING CONSSAMPLES [default4]:Traceback (most recent call last): [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default4]: main() [default4]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default4]: return f(*args, **kwargs) [default4]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default4]: pretrain( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default4]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default4]: train_dataloader = build_pretraining_data_loader( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default4]: batch_sampler = MegatronPretrainingSampler( [default4]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default4]: assert self.consumed_samples < self.total_samples, \ [default4]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]:Traceback (most recent call last): [default6]:Traceback (most recent call last): [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default6]: main() [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default7]: main() [default7]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default7]: return f(*args, **kwargs) [default7]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default6]: return f(*args, **kwargs) [default7]: pretrain( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default7]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default6]: pretrain( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default6]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default6]: train_dataloader = build_pretraining_data_loader( [default7]: train_dataloader = build_pretraining_data_loader( [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default7]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default6]: batch_sampler = MegatronPretrainingSampler( [default6]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default6]: assert self.consumed_samples < self.total_samples, \ [default6]:AssertionError: no samples left to consume: 165854565, 12547659 [default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default7]: assert self.consumed_samples < self.total_samples, \ [default7]:AssertionError: no samples left to consume: 165854565, 12547659 [default5]:Traceback (most recent call last): [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 207, in [default5]: main() [default5]: File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [default5]: return f(*args, **kwargs) [default5]: File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main [default5]: pretrain( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain [default5]: train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators [default5]: train_dataloader = build_pretraining_data_loader( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [default5]: batch_sampler = MegatronPretrainingSampler( [default5]: File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ [default5]: assert self.consumed_samples < self.total_samples, \ [default5]:AssertionError: no samples left to consume: 165854565, 12547659 WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2739728 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2330173 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1421396 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3033951 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2073150 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1817551) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3083829) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1029859) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 471933) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3709907) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 615273) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2082292) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3694951) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1653624) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 614526) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1901997) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 509592) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3754479) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2118057) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2274413) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3142905) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 521718) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3886197) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3738592) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3121690) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3254373 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2330172) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4077114) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1880226) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 350248) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1475855) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4016298) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2073147) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1545503) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2772090) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 3033952) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1993410) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 2739729) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1682116) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3254371) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 1421397) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2064518) of binary: /gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/bin/python Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return f(*args, **kwargs) return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run raise ChildFailedError( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam45-ib0 rank : 257 (local_rank: 1) exitcode : 1 (pid: 509593) error_file: /tmp/torchelastic_dngvd5by/none_3cpnanu9/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 return _run_code(code, main_globals, None, [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam45-ib0 rank : 258 (local_rank: 2) exitcode : 1 (pid: 509594) error_file: /tmp/torchelastic_dngvd5by/none_3cpnanu9/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam45-ib0 rank : 259 (local_rank: 3) exitcode : 1 (pid: 509595) error_file: /tmp/torchelastic_dngvd5by/none_3cpnanu9/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam45-ib0 rank : 260 (local_rank: 4) exitcode : 1 (pid: 509596) error_file: /tmp/torchelastic_dngvd5by/none_3cpnanu9/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam45-ib0 rank : 261 (local_rank: 5) exitcode : 1 (pid: 509597) error_file: /tmp/torchelastic_dngvd5by/none_3cpnanu9/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( elastic_launch( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader return _run_code(code, main_globals, None, elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam45-ib0 rank : 262 (local_rank: 6) exitcode : 1 (pid: 509598) error_file: /tmp/torchelastic_dngvd5by/none_3cpnanu9/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return _run_code(code, main_globals, None, raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam45-ib0 rank : 263 (local_rank: 7) exitcode : 1 (pid: 509599) error_file: /tmp/torchelastic_dngvd5by/none_3cpnanu9/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam45-ib0 rank : 256 (local_rank: 0) exitcode : 1 (pid: 509592) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in error_file: /tmp/torchelastic_dngvd5by/none_3cpnanu9/attempt_0/0/error.json traceback : Traceback (most recent call last): return _run_code(code, main_globals, None, exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam30-ib0 rank : 137 (local_rank: 1) exitcode : 1 (pid: 3694952) error_file: /tmp/torchelastic_r8k_zzxs/none_3l7r0nvw/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main exec(code, run_globals) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam30-ib0 rank : 138 (local_rank: 2) exitcode : 1 (pid: 3694953) error_file: /tmp/torchelastic_r8k_zzxs/none_3l7r0nvw/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam30-ib0 rank : 139 (local_rank: 3) exitcode : 1 (pid: 3694954) error_file: /tmp/torchelastic_r8k_zzxs/none_3l7r0nvw/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam30-ib0 rank : 140 (local_rank: 4) exitcode : 1 (pid: 3694955) error_file: /tmp/torchelastic_r8k_zzxs/none_3l7r0nvw/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam30-ib0 rank : 141 (local_rank: 5) exitcode : 1 (pid: 3694956) error_file: /tmp/torchelastic_r8k_zzxs/none_3l7r0nvw/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam30-ib0 rank : 142 (local_rank: 6) exitcode : 1 (pid: 3694957) error_file: /tmp/torchelastic_r8k_zzxs/none_3l7r0nvw/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam30-ib0 rank : 143 (local_rank: 7) exitcode : 1 (pid: 3694958) error_file: /tmp/torchelastic_r8k_zzxs/none_3l7r0nvw/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam30-ib0 rank : 136 (local_rank: 0) exitcode : 1 (pid: 3694951) error_file: /tmp/torchelastic_r8k_zzxs/none_3l7r0nvw/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in main() Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return launch_agent(self._config, self._entrypoint, list(args)) return launch_agent(self._config, self._entrypoint, list(args)) main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) return _run_code(code, main_globals, None, return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main raise ChildFailedError( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main raise ChildFailedError( main() main() raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper raise ChildFailedError( return f(*args, **kwargs) main() exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam39-ib0 rank : 209 (local_rank: 1) exitcode : 1 (pid: 1475856) error_file: /tmp/torchelastic_g0rhbrjd/none_f3y_6f_a/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam32-ib0 rank : 153 (local_rank: 1) exitcode : 1 (pid: 614527) error_file: /tmp/torchelastic_crxz07ia/none_g1lzz6ma/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam11-ib0 rank : 65 (local_rank: 1) exitcode : 1 (pid: 2073148) error_file: /tmp/torchelastic_initag2k/none_r701wpr6/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam03-ib0 rank : 9 (local_rank: 1) exitcode : 1 (pid: 1993411) error_file: /tmp/torchelastic_atfrhvyh/none_1xmbdfbo/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 return _run_code(code, main_globals, None, [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam11-ib0 rank : 66 (local_rank: 2) exitcode : 1 (pid: 2073149) error_file: /tmp/torchelastic_initag2k/none_r701wpr6/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( return f(*args, **kwargs) [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam32-ib0 rank : 154 (local_rank: 2) exitcode : 1 (pid: 614528) error_file: /tmp/torchelastic_crxz07ia/none_g1lzz6ma/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam11-ib0 rank : 68 (local_rank: 4) exitcode : 1 (pid: 2073151) error_file: /tmp/torchelastic_initag2k/none_r701wpr6/attempt_0/4/error.json traceback : Traceback (most recent call last): [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam39-ib0 rank : 210 (local_rank: 2) exitcode : 1 (pid: 1475857) error_file: /tmp/torchelastic_g0rhbrjd/none_f3y_6f_a/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam32-ib0 rank : 155 (local_rank: 3) exitcode : 1 (pid: 614529) error_file: /tmp/torchelastic_crxz07ia/none_g1lzz6ma/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return _run_code(code, main_globals, None, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam39-ib0 rank : 211 (local_rank: 3) exitcode : 1 (pid: 1475858) error_file: /tmp/torchelastic_g0rhbrjd/none_f3y_6f_a/attempt_0/3/error.json traceback : Traceback (most recent call last): main() run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam03-ib0 rank : 10 (local_rank: 2) exitcode : 1 (pid: 1993412) error_file: /tmp/torchelastic_atfrhvyh/none_1xmbdfbo/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) main() File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam03-ib0 rank : 11 (local_rank: 3) exitcode : 1 (pid: 1993413) error_file: /tmp/torchelastic_atfrhvyh/none_1xmbdfbo/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam11-ib0 rank : 69 (local_rank: 5) exitcode : 1 (pid: 2073152) error_file: /tmp/torchelastic_initag2k/none_r701wpr6/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( elastic_launch( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam32-ib0 rank : 156 (local_rank: 4) exitcode : 1 (pid: 614530) error_file: /tmp/torchelastic_crxz07ia/none_g1lzz6ma/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam39-ib0 rank : 212 (local_rank: 4) exitcode : 1 (pid: 1475859) error_file: /tmp/torchelastic_g0rhbrjd/none_f3y_6f_a/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam11-ib0 rank : 70 (local_rank: 6) exitcode : 1 (pid: 2073153) error_file: /tmp/torchelastic_initag2k/none_r701wpr6/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( raise ChildFailedError( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam03-ib0 rank : 12 (local_rank: 4) exitcode : 1 (pid: 1993414) error_file: /tmp/torchelastic_atfrhvyh/none_1xmbdfbo/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam32-ib0 rank : 157 (local_rank: 5) exitcode : 1 (pid: 614531) error_file: /tmp/torchelastic_crxz07ia/none_g1lzz6ma/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( return f(*args, **kwargs) main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam39-ib0 rank : 213 (local_rank: 5) exitcode : 1 (pid: 1475860) error_file: /tmp/torchelastic_g0rhbrjd/none_f3y_6f_a/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam11-ib0 rank : 71 (local_rank: 7) exitcode : 1 (pid: 2073154) error_file: /tmp/torchelastic_initag2k/none_r701wpr6/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader main() [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam03-ib0 rank : 13 (local_rank: 5) exitcode : 1 (pid: 1993415) error_file: /tmp/torchelastic_atfrhvyh/none_1xmbdfbo/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam32-ib0 rank : 158 (local_rank: 6) exitcode : 1 (pid: 614532) error_file: /tmp/torchelastic_crxz07ia/none_g1lzz6ma/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam39-ib0 rank : 214 (local_rank: 6) exitcode : 1 (pid: 1475861) error_file: /tmp/torchelastic_g0rhbrjd/none_f3y_6f_a/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 return f(*args, **kwargs) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam11-ib0 rank : 64 (local_rank: 0) exitcode : 1 (pid: 2073147) error_file: /tmp/torchelastic_initag2k/none_r701wpr6/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam35-ib0 rank : 177 (local_rank: 1) exitcode : 1 (pid: 1653625) error_file: /tmp/torchelastic_3l893nf5/none_s4lem5rq/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam03-ib0 rank : 14 (local_rank: 6) exitcode : 1 (pid: 1993416) error_file: /tmp/torchelastic_atfrhvyh/none_1xmbdfbo/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( run(args) [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam32-ib0 rank : 159 (local_rank: 7) exitcode : 1 (pid: 614533) error_file: /tmp/torchelastic_crxz07ia/none_g1lzz6ma/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam39-ib0 rank : 215 (local_rank: 7) exitcode : 1 (pid: 1475862) error_file: /tmp/torchelastic_g0rhbrjd/none_f3y_6f_a/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam19-ib0 rank : 105 (local_rank: 1) exitcode : 1 (pid: 1545504) error_file: /tmp/torchelastic_6s8ddu_m/none_pt2pyeyx/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( run(args) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam32-ib0 rank : 152 (local_rank: 0) exitcode : 1 (pid: 614526) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam34-ib0 rank : 169 (local_rank: 1) exitcode : 1 (pid: 1817552) error_file: /tmp/torchelastic_7bjhnqdr/none_u0zsiwdm/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam39-ib0 rank : 208 (local_rank: 0) exitcode : 1 (pid: 1475855) [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam03-ib0 rank : 15 (local_rank: 7) exitcode : 1 (pid: 1993417) error_file: /tmp/torchelastic_atfrhvyh/none_1xmbdfbo/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run error_file: /tmp/torchelastic_crxz07ia/none_g1lzz6ma/attempt_0/0/error.json traceback : Traceback (most recent call last): [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam35-ib0 rank : 178 (local_rank: 2) exitcode : 1 (pid: 1653626) error_file: /tmp/torchelastic_3l893nf5/none_s4lem5rq/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run error_file: /tmp/torchelastic_g0rhbrjd/none_f3y_6f_a/attempt_0/0/error.json traceback : Traceback (most recent call last): elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam03-ib0 rank : 8 (local_rank: 0) exitcode : 1 (pid: 1993410) [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam19-ib0 rank : 106 (local_rank: 2) exitcode : 1 (pid: 1545505) error_file: /tmp/torchelastic_6s8ddu_m/none_pt2pyeyx/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam35-ib0 rank : 179 (local_rank: 3) exitcode : 1 (pid: 1653627) error_file: /tmp/torchelastic_3l893nf5/none_s4lem5rq/attempt_0/3/error.json traceback : Traceback (most recent call last): return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main error_file: /tmp/torchelastic_atfrhvyh/none_1xmbdfbo/attempt_0/0/error.json traceback : Traceback (most recent call last): main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam19-ib0 rank : 107 (local_rank: 3) exitcode : 1 (pid: 1545506) error_file: /tmp/torchelastic_6s8ddu_m/none_pt2pyeyx/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam34-ib0 rank : 170 (local_rank: 2) exitcode : 1 (pid: 1817553) error_file: /tmp/torchelastic_7bjhnqdr/none_u0zsiwdm/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam34-ib0 rank : 171 (local_rank: 3) exitcode : 1 (pid: 1817554) error_file: /tmp/torchelastic_7bjhnqdr/none_u0zsiwdm/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) elastic_launch( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam35-ib0 rank : 180 (local_rank: 4) exitcode : 1 (pid: 1653628) error_file: /tmp/torchelastic_3l893nf5/none_s4lem5rq/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam19-ib0 rank : 108 (local_rank: 4) exitcode : 1 (pid: 1545507) error_file: /tmp/torchelastic_6s8ddu_m/none_pt2pyeyx/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( run(args) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam34-ib0 rank : 172 (local_rank: 4) exitcode : 1 (pid: 1817555) error_file: /tmp/torchelastic_7bjhnqdr/none_u0zsiwdm/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam35-ib0 rank : 181 (local_rank: 5) exitcode : 1 (pid: 1653629) error_file: /tmp/torchelastic_3l893nf5/none_s4lem5rq/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( main() raise ChildFailedError( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam19-ib0 rank : 109 (local_rank: 5) exitcode : 1 (pid: 1545508) error_file: /tmp/torchelastic_6s8ddu_m/none_pt2pyeyx/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader raise ChildFailedError( return f(*args, **kwargs) elastic_launch( main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam34-ib0 rank : 173 (local_rank: 5) exitcode : 1 (pid: 1817556) error_file: /tmp/torchelastic_7bjhnqdr/none_u0zsiwdm/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam35-ib0 rank : 182 (local_rank: 6) exitcode : 1 (pid: 1653630) error_file: /tmp/torchelastic_3l893nf5/none_s4lem5rq/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam19-ib0 rank : 110 (local_rank: 6) exitcode : 1 (pid: 1545509) error_file: /tmp/torchelastic_6s8ddu_m/none_pt2pyeyx/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ run(args) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam38-ib0 rank : 201 (local_rank: 1) exitcode : 1 (pid: 3886198) error_file: /tmp/torchelastic_x1s0prbm/none_ymv48vud/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam34-ib0 rank : 174 (local_rank: 6) exitcode : 1 (pid: 1817557) error_file: /tmp/torchelastic_7bjhnqdr/none_u0zsiwdm/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam35-ib0 rank : 183 (local_rank: 7) exitcode : 1 (pid: 1653631) error_file: /tmp/torchelastic_3l893nf5/none_s4lem5rq/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( main() elastic_launch( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam33-ib0 rank : 161 (local_rank: 1) exitcode : 1 (pid: 471934) error_file: /tmp/torchelastic_1y9lj06q/none_yzq_6ler/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam19-ib0 rank : 111 (local_rank: 7) exitcode : 1 (pid: 1545510) error_file: /tmp/torchelastic_6s8ddu_m/none_pt2pyeyx/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam35-ib0 rank : 176 (local_rank: 0) exitcode : 1 (pid: 1653624) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam36-ib0 rank : 185 (local_rank: 1) exitcode : 1 (pid: 1901998) error_file: /tmp/torchelastic_hs251nvi/none_0me88nzs/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam37-ib0 rank : 193 (local_rank: 1) exitcode : 1 (pid: 3254372) error_file: /tmp/torchelastic_yrb4bfyo/none_78pi0v97/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam40-ib0 rank : 218 (local_rank: 2) exitcode : 1 (pid: 1421398) error_file: /tmp/torchelastic_ugmpdj0h/none_svp5ty48/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam19-ib0 rank : 104 (local_rank: 0) exitcode : 1 (pid: 1545503) [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam34-ib0 rank : 175 (local_rank: 7) exitcode : 1 (pid: 1817558) error_file: /tmp/torchelastic_7bjhnqdr/none_u0zsiwdm/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( error_file: /tmp/torchelastic_3l893nf5/none_s4lem5rq/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam38-ib0 rank : 202 (local_rank: 2) exitcode : 1 (pid: 3886199) error_file: /tmp/torchelastic_x1s0prbm/none_ymv48vud/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main error_file: /tmp/torchelastic_6s8ddu_m/none_pt2pyeyx/attempt_0/0/error.json traceback : Traceback (most recent call last): torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam43-ib0 rank : 241 (local_rank: 1) exitcode : 1 (pid: 3083830) error_file: /tmp/torchelastic_wmtubi2s/none_fdcd1wwy/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam34-ib0 rank : 168 (local_rank: 0) exitcode : 1 (pid: 1817551) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam37-ib0 rank : 195 (local_rank: 3) exitcode : 1 (pid: 3254374) error_file: /tmp/torchelastic_yrb4bfyo/none_78pi0v97/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam33-ib0 rank : 162 (local_rank: 2) exitcode : 1 (pid: 471935) error_file: /tmp/torchelastic_1y9lj06q/none_yzq_6ler/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam40-ib0 rank : 219 (local_rank: 3) exitcode : 1 (pid: 1421399) error_file: /tmp/torchelastic_ugmpdj0h/none_svp5ty48/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam38-ib0 rank : 203 (local_rank: 3) exitcode : 1 (pid: 3886200) error_file: /tmp/torchelastic_x1s0prbm/none_ymv48vud/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 error_file: /tmp/torchelastic_7bjhnqdr/none_u0zsiwdm/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam37-ib0 rank : 196 (local_rank: 4) exitcode : 1 (pid: 3254375) error_file: /tmp/torchelastic_yrb4bfyo/none_78pi0v97/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam33-ib0 rank : 163 (local_rank: 3) exitcode : 1 (pid: 471936) error_file: /tmp/torchelastic_1y9lj06q/none_yzq_6ler/attempt_0/3/error.json traceback : Traceback (most recent call last): elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam40-ib0 rank : 220 (local_rank: 4) exitcode : 1 (pid: 1421400) error_file: /tmp/torchelastic_ugmpdj0h/none_svp5ty48/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam43-ib0 rank : 242 (local_rank: 2) exitcode : 1 (pid: 3083831) error_file: /tmp/torchelastic_wmtubi2s/none_fdcd1wwy/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam36-ib0 rank : 186 (local_rank: 2) exitcode : 1 (pid: 1901999) error_file: /tmp/torchelastic_hs251nvi/none_0me88nzs/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam43-ib0 rank : 243 (local_rank: 3) exitcode : 1 (pid: 3083832) error_file: /tmp/torchelastic_wmtubi2s/none_fdcd1wwy/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam36-ib0 rank : 187 (local_rank: 3) exitcode : 1 (pid: 1902000) error_file: /tmp/torchelastic_hs251nvi/none_0me88nzs/attempt_0/3/error.json traceback : Traceback (most recent call last): raise ChildFailedError( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam38-ib0 rank : 204 (local_rank: 4) exitcode : 1 (pid: 3886201) error_file: /tmp/torchelastic_x1s0prbm/none_ymv48vud/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam37-ib0 rank : 197 (local_rank: 5) exitcode : 1 (pid: 3254376) error_file: /tmp/torchelastic_yrb4bfyo/none_78pi0v97/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam33-ib0 rank : 164 (local_rank: 4) exitcode : 1 (pid: 471937) error_file: /tmp/torchelastic_1y9lj06q/none_yzq_6ler/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam40-ib0 rank : 221 (local_rank: 5) exitcode : 1 (pid: 1421401) error_file: /tmp/torchelastic_ugmpdj0h/none_svp5ty48/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam38-ib0 rank : 205 (local_rank: 5) exitcode : 1 (pid: 3886202) error_file: /tmp/torchelastic_x1s0prbm/none_ymv48vud/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam43-ib0 rank : 244 (local_rank: 4) exitcode : 1 (pid: 3083833) error_file: /tmp/torchelastic_wmtubi2s/none_fdcd1wwy/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam36-ib0 rank : 188 (local_rank: 4) exitcode : 1 (pid: 1902001) error_file: /tmp/torchelastic_hs251nvi/none_0me88nzs/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam06-ib0 rank : 33 (local_rank: 1) exitcode : 1 (pid: 3754480) error_file: /tmp/torchelastic_iw8ubfxi/none_85csr8in/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam37-ib0 rank : 198 (local_rank: 6) exitcode : 1 (pid: 3254377) error_file: /tmp/torchelastic_yrb4bfyo/none_78pi0v97/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam33-ib0 rank : 165 (local_rank: 5) exitcode : 1 (pid: 471938) error_file: /tmp/torchelastic_1y9lj06q/none_yzq_6ler/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam40-ib0 rank : 222 (local_rank: 6) exitcode : 1 (pid: 1421402) error_file: /tmp/torchelastic_ugmpdj0h/none_svp5ty48/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam38-ib0 rank : 206 (local_rank: 6) exitcode : 1 (pid: 3886203) error_file: /tmp/torchelastic_x1s0prbm/none_ymv48vud/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam43-ib0 rank : 245 (local_rank: 5) exitcode : 1 (pid: 3083834) error_file: /tmp/torchelastic_wmtubi2s/none_fdcd1wwy/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam36-ib0 rank : 189 (local_rank: 5) exitcode : 1 (pid: 1902002) error_file: /tmp/torchelastic_hs251nvi/none_0me88nzs/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam06-ib0 rank : 34 (local_rank: 2) exitcode : 1 (pid: 3754481) error_file: /tmp/torchelastic_iw8ubfxi/none_85csr8in/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam37-ib0 rank : 199 (local_rank: 7) exitcode : 1 (pid: 3254378) error_file: /tmp/torchelastic_yrb4bfyo/none_78pi0v97/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam33-ib0 rank : 166 (local_rank: 6) exitcode : 1 (pid: 471939) error_file: /tmp/torchelastic_1y9lj06q/none_yzq_6ler/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( run(args) batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam40-ib0 rank : 223 (local_rank: 7) exitcode : 1 (pid: 1421403) error_file: /tmp/torchelastic_ugmpdj0h/none_svp5ty48/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam06-ib0 rank : 35 (local_rank: 3) exitcode : 1 (pid: 3754483) error_file: /tmp/torchelastic_iw8ubfxi/none_85csr8in/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam38-ib0 rank : 207 (local_rank: 7) exitcode : 1 (pid: 3886204) error_file: /tmp/torchelastic_x1s0prbm/none_ymv48vud/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam43-ib0 rank : 246 (local_rank: 6) exitcode : 1 (pid: 3083835) error_file: /tmp/torchelastic_wmtubi2s/none_fdcd1wwy/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam36-ib0 rank : 190 (local_rank: 6) exitcode : 1 (pid: 1902003) error_file: /tmp/torchelastic_hs251nvi/none_0me88nzs/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam37-ib0 rank : 192 (local_rank: 0) exitcode : 1 (pid: 3254371) error_file: /tmp/torchelastic_yrb4bfyo/none_78pi0v97/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam33-ib0 rank : 167 (local_rank: 7) exitcode : 1 (pid: 471940) error_file: /tmp/torchelastic_1y9lj06q/none_yzq_6ler/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam40-ib0 rank : 217 (local_rank: 1) exitcode : 1 (pid: 1421397) error_file: /tmp/torchelastic_ugmpdj0h/none_svp5ty48/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam38-ib0 rank : 200 (local_rank: 0) exitcode : 1 (pid: 3886197) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam33-ib0 rank : 160 (local_rank: 0) exitcode : 1 (pid: 471933) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ error_file: /tmp/torchelastic_x1s0prbm/none_ymv48vud/attempt_0/0/error.json traceback : Traceback (most recent call last): return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam43-ib0 rank : 247 (local_rank: 7) exitcode : 1 (pid: 3083836) error_file: /tmp/torchelastic_wmtubi2s/none_fdcd1wwy/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( main() [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam36-ib0 rank : 191 (local_rank: 7) exitcode : 1 (pid: 1902004) error_file: /tmp/torchelastic_hs251nvi/none_0me88nzs/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam06-ib0 rank : 36 (local_rank: 4) exitcode : 1 (pid: 3754484) error_file: /tmp/torchelastic_iw8ubfxi/none_85csr8in/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic_1y9lj06q/none_yzq_6ler/attempt_0/0/error.json traceback : Traceback (most recent call last): torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam46-ib0 rank : 265 (local_rank: 1) exitcode : 1 (pid: 4016299) error_file: /tmp/torchelastic_ji8flz_n/none_p4bndsm9/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam43-ib0 rank : 240 (local_rank: 0) exitcode : 1 (pid: 3083829) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam36-ib0 rank : 184 (local_rank: 0) exitcode : 1 (pid: 1901997) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ error_file: /tmp/torchelastic_wmtubi2s/none_fdcd1wwy/attempt_0/0/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_hs251nvi/none_0me88nzs/attempt_0/0/error.json traceback : Traceback (most recent call last): [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam06-ib0 rank : 37 (local_rank: 5) exitcode : 1 (pid: 3754485) error_file: /tmp/torchelastic_iw8ubfxi/none_85csr8in/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam06-ib0 rank : 38 (local_rank: 6) exitcode : 1 (pid: 3754486) error_file: /tmp/torchelastic_iw8ubfxi/none_85csr8in/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 main() File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam46-ib0 rank : 266 (local_rank: 2) exitcode : 1 (pid: 4016300) error_file: /tmp/torchelastic_ji8flz_n/none_p4bndsm9/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam06-ib0 rank : 39 (local_rank: 7) exitcode : 1 (pid: 3754487) error_file: /tmp/torchelastic_iw8ubfxi/none_85csr8in/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam06-ib0 rank : 32 (local_rank: 0) exitcode : 1 (pid: 3754479) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam46-ib0 rank : 267 (local_rank: 3) exitcode : 1 (pid: 4016301) error_file: /tmp/torchelastic_ji8flz_n/none_p4bndsm9/attempt_0/3/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_iw8ubfxi/none_85csr8in/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam46-ib0 rank : 268 (local_rank: 4) exitcode : 1 (pid: 4016302) error_file: /tmp/torchelastic_ji8flz_n/none_p4bndsm9/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam46-ib0 rank : 269 (local_rank: 5) exitcode : 1 (pid: 4016303) error_file: /tmp/torchelastic_ji8flz_n/none_p4bndsm9/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader elastic_launch( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam46-ib0 rank : 270 (local_rank: 6) exitcode : 1 (pid: 4016304) error_file: /tmp/torchelastic_ji8flz_n/none_p4bndsm9/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam02-ib0 rank : 1 (local_rank: 1) exitcode : 1 (pid: 3738593) error_file: /tmp/torchelastic_11sph7qp/none_i863iiqb/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam46-ib0 rank : 271 (local_rank: 7) exitcode : 1 (pid: 4016305) error_file: /tmp/torchelastic_ji8flz_n/none_p4bndsm9/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam46-ib0 rank : 264 (local_rank: 0) exitcode : 1 (pid: 4016298) return f(*args, **kwargs) error_file: /tmp/torchelastic_ji8flz_n/none_p4bndsm9/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main elastic_launch( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam02-ib0 rank : 2 (local_rank: 2) exitcode : 1 (pid: 3738594) error_file: /tmp/torchelastic_11sph7qp/none_i863iiqb/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam02-ib0 rank : 3 (local_rank: 3) exitcode : 1 (pid: 3738595) error_file: /tmp/torchelastic_11sph7qp/none_i863iiqb/attempt_0/3/error.json traceback : Traceback (most recent call last): return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) return launch_agent(self._config, self._entrypoint, list(args)) return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam02-ib0 rank : 4 (local_rank: 4) exitcode : 1 (pid: 3738596) error_file: /tmp/torchelastic_11sph7qp/none_i863iiqb/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam02-ib0 rank : 5 (local_rank: 5) exitcode : 1 (pid: 3738597) error_file: /tmp/torchelastic_11sph7qp/none_i863iiqb/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader raise ChildFailedError( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam02-ib0 rank : 6 (local_rank: 6) exitcode : 1 (pid: 3738598) error_file: /tmp/torchelastic_11sph7qp/none_i863iiqb/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( return launch_agent(self._config, self._entrypoint, list(args)) raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam02-ib0 rank : 7 (local_rank: 7) exitcode : 1 (pid: 3738599) error_file: /tmp/torchelastic_11sph7qp/none_i863iiqb/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam02-ib0 rank : 0 (local_rank: 0) exitcode : 1 (pid: 3738592) raise ChildFailedError( error_file: /tmp/torchelastic_11sph7qp/none_i863iiqb/attempt_0/0/error.json traceback : Traceback (most recent call last): raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam08-ib0 rank : 50 (local_rank: 2) exitcode : 1 (pid: 3033953) error_file: /tmp/torchelastic_8063ir5b/none_wc5eactu/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam47-ib0 rank : 273 (local_rank: 1) exitcode : 1 (pid: 1029860) error_file: /tmp/torchelastic__m5xftw4/none_n8hb4xkg/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam28-ib0 rank : 129 (local_rank: 1) exitcode : 1 (pid: 3709908) error_file: /tmp/torchelastic_tix4xnvy/none_tof1l0nt/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam41-ib0 rank : 225 (local_rank: 1) exitcode : 1 (pid: 2772091) error_file: /tmp/torchelastic_q4r5_12m/none_33jmjokh/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam09-ib0 rank : 57 (local_rank: 1) exitcode : 1 (pid: 2118058) error_file: /tmp/torchelastic_fnkb4521/none_qtfj1ibl/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam04-ib0 rank : 17 (local_rank: 1) exitcode : 1 (pid: 2082293) error_file: /tmp/torchelastic_g6hu2_xf/none_icr79ec3/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam08-ib0 rank : 51 (local_rank: 3) exitcode : 1 (pid: 3033954) error_file: /tmp/torchelastic_8063ir5b/none_wc5eactu/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam28-ib0 rank : 130 (local_rank: 2) exitcode : 1 (pid: 3709909) error_file: /tmp/torchelastic_tix4xnvy/none_tof1l0nt/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam41-ib0 rank : 226 (local_rank: 2) exitcode : 1 (pid: 2772092) error_file: /tmp/torchelastic_q4r5_12m/none_33jmjokh/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam47-ib0 rank : 274 (local_rank: 2) exitcode : 1 (pid: 1029861) error_file: /tmp/torchelastic__m5xftw4/none_n8hb4xkg/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam08-ib0 rank : 52 (local_rank: 4) exitcode : 1 (pid: 3033955) error_file: /tmp/torchelastic_8063ir5b/none_wc5eactu/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam28-ib0 rank : 131 (local_rank: 3) exitcode : 1 (pid: 3709910) error_file: /tmp/torchelastic_tix4xnvy/none_tof1l0nt/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam41-ib0 rank : 227 (local_rank: 3) exitcode : 1 (pid: 2772093) error_file: /tmp/torchelastic_q4r5_12m/none_33jmjokh/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam47-ib0 rank : 275 (local_rank: 3) exitcode : 1 (pid: 1029862) error_file: /tmp/torchelastic__m5xftw4/none_n8hb4xkg/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam07-ib0 rank : 41 (local_rank: 1) exitcode : 1 (pid: 4077115) error_file: /tmp/torchelastic_a6y4_xnh/none_vlnhrj5l/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam09-ib0 rank : 58 (local_rank: 2) exitcode : 1 (pid: 2118059) error_file: /tmp/torchelastic_fnkb4521/none_qtfj1ibl/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam09-ib0 rank : 59 (local_rank: 3) exitcode : 1 (pid: 2118060) error_file: /tmp/torchelastic_fnkb4521/none_qtfj1ibl/attempt_0/3/error.json traceback : Traceback (most recent call last): [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam04-ib0 rank : 18 (local_rank: 2) exitcode : 1 (pid: 2082294) error_file: /tmp/torchelastic_g6hu2_xf/none_icr79ec3/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam08-ib0 rank : 53 (local_rank: 5) exitcode : 1 (pid: 3033956) error_file: /tmp/torchelastic_8063ir5b/none_wc5eactu/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam28-ib0 rank : 132 (local_rank: 4) exitcode : 1 (pid: 3709911) error_file: /tmp/torchelastic_tix4xnvy/none_tof1l0nt/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam41-ib0 rank : 228 (local_rank: 4) exitcode : 1 (pid: 2772094) error_file: /tmp/torchelastic_q4r5_12m/none_33jmjokh/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam07-ib0 rank : 42 (local_rank: 2) exitcode : 1 (pid: 4077116) error_file: /tmp/torchelastic_a6y4_xnh/none_vlnhrj5l/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam47-ib0 rank : 276 (local_rank: 4) exitcode : 1 (pid: 1029863) error_file: /tmp/torchelastic__m5xftw4/none_n8hb4xkg/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam04-ib0 rank : 19 (local_rank: 3) exitcode : 1 (pid: 2082295) error_file: /tmp/torchelastic_g6hu2_xf/none_icr79ec3/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam07-ib0 rank : 43 (local_rank: 3) exitcode : 1 (pid: 4077117) error_file: /tmp/torchelastic_a6y4_xnh/none_vlnhrj5l/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam08-ib0 rank : 54 (local_rank: 6) exitcode : 1 (pid: 3033957) error_file: /tmp/torchelastic_8063ir5b/none_wc5eactu/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam28-ib0 rank : 133 (local_rank: 5) exitcode : 1 (pid: 3709912) error_file: /tmp/torchelastic_tix4xnvy/none_tof1l0nt/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam41-ib0 rank : 229 (local_rank: 5) exitcode : 1 (pid: 2772095) error_file: /tmp/torchelastic_q4r5_12m/none_33jmjokh/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam47-ib0 rank : 277 (local_rank: 5) exitcode : 1 (pid: 1029864) error_file: /tmp/torchelastic__m5xftw4/none_n8hb4xkg/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam09-ib0 rank : 60 (local_rank: 4) exitcode : 1 (pid: 2118061) error_file: /tmp/torchelastic_fnkb4521/none_qtfj1ibl/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam04-ib0 rank : 20 (local_rank: 4) exitcode : 1 (pid: 2082296) error_file: /tmp/torchelastic_g6hu2_xf/none_icr79ec3/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam08-ib0 rank : 55 (local_rank: 7) exitcode : 1 (pid: 3033958) error_file: /tmp/torchelastic_8063ir5b/none_wc5eactu/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam28-ib0 rank : 134 (local_rank: 6) exitcode : 1 (pid: 3709913) error_file: /tmp/torchelastic_tix4xnvy/none_tof1l0nt/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam41-ib0 rank : 230 (local_rank: 6) exitcode : 1 (pid: 2772096) error_file: /tmp/torchelastic_q4r5_12m/none_33jmjokh/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam07-ib0 rank : 44 (local_rank: 4) exitcode : 1 (pid: 4077118) error_file: /tmp/torchelastic_a6y4_xnh/none_vlnhrj5l/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam47-ib0 rank : 278 (local_rank: 6) exitcode : 1 (pid: 1029865) error_file: /tmp/torchelastic__m5xftw4/none_n8hb4xkg/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam09-ib0 rank : 61 (local_rank: 5) exitcode : 1 (pid: 2118062) error_file: /tmp/torchelastic_fnkb4521/none_qtfj1ibl/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam04-ib0 rank : 21 (local_rank: 5) exitcode : 1 (pid: 2082297) error_file: /tmp/torchelastic_g6hu2_xf/none_icr79ec3/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam08-ib0 rank : 49 (local_rank: 1) exitcode : 1 (pid: 3033952) error_file: /tmp/torchelastic_8063ir5b/none_wc5eactu/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam28-ib0 rank : 135 (local_rank: 7) exitcode : 1 (pid: 3709914) error_file: /tmp/torchelastic_tix4xnvy/none_tof1l0nt/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam41-ib0 rank : 231 (local_rank: 7) exitcode : 1 (pid: 2772097) error_file: /tmp/torchelastic_q4r5_12m/none_33jmjokh/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam07-ib0 rank : 45 (local_rank: 5) exitcode : 1 (pid: 4077119) error_file: /tmp/torchelastic_a6y4_xnh/none_vlnhrj5l/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam47-ib0 rank : 279 (local_rank: 7) exitcode : 1 (pid: 1029866) error_file: /tmp/torchelastic__m5xftw4/none_n8hb4xkg/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam09-ib0 rank : 62 (local_rank: 6) exitcode : 1 (pid: 2118063) error_file: /tmp/torchelastic_fnkb4521/none_qtfj1ibl/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam28-ib0 rank : 128 (local_rank: 0) exitcode : 1 (pid: 3709907) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam41-ib0 rank : 224 (local_rank: 0) exitcode : 1 (pid: 2772090) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam47-ib0 rank : 272 (local_rank: 0) exitcode : 1 (pid: 1029859) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam04-ib0 rank : 22 (local_rank: 6) exitcode : 1 (pid: 2082298) error_file: /tmp/torchelastic_g6hu2_xf/none_icr79ec3/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic_tix4xnvy/none_tof1l0nt/attempt_0/0/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_q4r5_12m/none_33jmjokh/attempt_0/0/error.json traceback : Traceback (most recent call last): batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam07-ib0 rank : 46 (local_rank: 6) exitcode : 1 (pid: 4077120) error_file: /tmp/torchelastic_a6y4_xnh/none_vlnhrj5l/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic__m5xftw4/none_n8hb4xkg/attempt_0/0/error.json traceback : Traceback (most recent call last): elastic_launch( [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam09-ib0 rank : 63 (local_rank: 7) exitcode : 1 (pid: 2118064) error_file: /tmp/torchelastic_fnkb4521/none_qtfj1ibl/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam26-ib0 rank : 113 (local_rank: 1) exitcode : 1 (pid: 521719) error_file: /tmp/torchelastic_923rdxly/none_up4bcvs4/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam09-ib0 rank : 56 (local_rank: 0) exitcode : 1 (pid: 2118057) [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam04-ib0 rank : 23 (local_rank: 7) exitcode : 1 (pid: 2082299) error_file: /tmp/torchelastic_g6hu2_xf/none_icr79ec3/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam07-ib0 rank : 47 (local_rank: 7) exitcode : 1 (pid: 4077121) error_file: /tmp/torchelastic_a6y4_xnh/none_vlnhrj5l/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ error_file: /tmp/torchelastic_fnkb4521/none_qtfj1ibl/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam04-ib0 rank : 16 (local_rank: 0) exitcode : 1 (pid: 2082292) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam07-ib0 rank : 40 (local_rank: 0) exitcode : 1 (pid: 4077114) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 raise ChildFailedError( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( error_file: /tmp/torchelastic_g6hu2_xf/none_icr79ec3/attempt_0/0/error.json traceback : Traceback (most recent call last): error_file: /tmp/torchelastic_a6y4_xnh/none_vlnhrj5l/attempt_0/0/error.json traceback : Traceback (most recent call last): [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam26-ib0 rank : 114 (local_rank: 2) exitcode : 1 (pid: 521720) error_file: /tmp/torchelastic_923rdxly/none_up4bcvs4/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam42-ib0 rank : 233 (local_rank: 1) exitcode : 1 (pid: 3142906) error_file: /tmp/torchelastic_gtr7uup5/none_gkbb7o2w/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam26-ib0 rank : 115 (local_rank: 3) exitcode : 1 (pid: 521721) error_file: /tmp/torchelastic_923rdxly/none_up4bcvs4/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam14-ib0 rank : 82 (local_rank: 2) exitcode : 1 (pid: 2330174) error_file: /tmp/torchelastic_3mkvyd2j/none_mxk4icv_/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam18-ib0 rank : 98 (local_rank: 2) exitcode : 1 (pid: 2739730) error_file: /tmp/torchelastic_5811g1a8/none_tgjnadso/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam26-ib0 rank : 116 (local_rank: 4) exitcode : 1 (pid: 521722) error_file: /tmp/torchelastic_923rdxly/none_up4bcvs4/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam42-ib0 rank : 234 (local_rank: 2) exitcode : 1 (pid: 3142907) error_file: /tmp/torchelastic_gtr7uup5/none_gkbb7o2w/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam42-ib0 rank : 235 (local_rank: 3) exitcode : 1 (pid: 3142908) error_file: /tmp/torchelastic_gtr7uup5/none_gkbb7o2w/attempt_0/3/error.json traceback : Traceback (most recent call last): raise ChildFailedError( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam26-ib0 rank : 117 (local_rank: 5) exitcode : 1 (pid: 521723) error_file: /tmp/torchelastic_923rdxly/none_up4bcvs4/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam18-ib0 rank : 99 (local_rank: 3) exitcode : 1 (pid: 2739731) error_file: /tmp/torchelastic_5811g1a8/none_tgjnadso/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam14-ib0 rank : 83 (local_rank: 3) exitcode : 1 (pid: 2330175) error_file: /tmp/torchelastic_3mkvyd2j/none_mxk4icv_/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam18-ib0 rank : 100 (local_rank: 4) exitcode : 1 (pid: 2739732) error_file: /tmp/torchelastic_5811g1a8/none_tgjnadso/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam14-ib0 rank : 84 (local_rank: 4) exitcode : 1 (pid: 2330176) error_file: /tmp/torchelastic_3mkvyd2j/none_mxk4icv_/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam26-ib0 rank : 118 (local_rank: 6) exitcode : 1 (pid: 521724) error_file: /tmp/torchelastic_923rdxly/none_up4bcvs4/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam42-ib0 rank : 236 (local_rank: 4) exitcode : 1 (pid: 3142909) error_file: /tmp/torchelastic_gtr7uup5/none_gkbb7o2w/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam26-ib0 rank : 119 (local_rank: 7) exitcode : 1 (pid: 521725) error_file: /tmp/torchelastic_923rdxly/none_up4bcvs4/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam18-ib0 rank : 101 (local_rank: 5) exitcode : 1 (pid: 2739733) error_file: /tmp/torchelastic_5811g1a8/none_tgjnadso/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam14-ib0 rank : 85 (local_rank: 5) exitcode : 1 (pid: 2330177) error_file: /tmp/torchelastic_3mkvyd2j/none_mxk4icv_/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam42-ib0 rank : 237 (local_rank: 5) exitcode : 1 (pid: 3142910) error_file: /tmp/torchelastic_gtr7uup5/none_gkbb7o2w/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam27-ib0 rank : 121 (local_rank: 1) exitcode : 1 (pid: 350249) error_file: /tmp/torchelastic_xhowux29/none_tw3j0hra/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam26-ib0 rank : 112 (local_rank: 0) exitcode : 1 (pid: 521718) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam31-ib0 rank : 145 (local_rank: 1) exitcode : 1 (pid: 615274) error_file: /tmp/torchelastic_c3f4p7kx/none_e6zdop4t/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic_923rdxly/none_up4bcvs4/attempt_0/0/error.json traceback : Traceback (most recent call last): [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam18-ib0 rank : 102 (local_rank: 6) exitcode : 1 (pid: 2739734) error_file: /tmp/torchelastic_5811g1a8/none_tgjnadso/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam14-ib0 rank : 86 (local_rank: 6) exitcode : 1 (pid: 2330178) error_file: /tmp/torchelastic_3mkvyd2j/none_mxk4icv_/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam42-ib0 rank : 238 (local_rank: 6) exitcode : 1 (pid: 3142911) error_file: /tmp/torchelastic_gtr7uup5/none_gkbb7o2w/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam18-ib0 rank : 103 (local_rank: 7) exitcode : 1 (pid: 2739735) error_file: /tmp/torchelastic_5811g1a8/none_tgjnadso/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam14-ib0 rank : 87 (local_rank: 7) exitcode : 1 (pid: 2330179) error_file: /tmp/torchelastic_3mkvyd2j/none_mxk4icv_/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam42-ib0 rank : 239 (local_rank: 7) exitcode : 1 (pid: 3142912) error_file: /tmp/torchelastic_gtr7uup5/none_gkbb7o2w/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam42-ib0 rank : 232 (local_rank: 0) exitcode : 1 (pid: 3142905) [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam27-ib0 rank : 122 (local_rank: 2) exitcode : 1 (pid: 350250) error_file: /tmp/torchelastic_xhowux29/none_tw3j0hra/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam18-ib0 rank : 97 (local_rank: 1) exitcode : 1 (pid: 2739729) error_file: /tmp/torchelastic_5811g1a8/none_tgjnadso/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam14-ib0 rank : 80 (local_rank: 0) exitcode : 1 (pid: 2330172) error_file: /tmp/torchelastic_3mkvyd2j/none_mxk4icv_/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( error_file: /tmp/torchelastic_gtr7uup5/none_gkbb7o2w/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam27-ib0 rank : 123 (local_rank: 3) exitcode : 1 (pid: 350251) error_file: /tmp/torchelastic_xhowux29/none_tw3j0hra/attempt_0/3/error.json traceback : Traceback (most recent call last): [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam31-ib0 rank : 146 (local_rank: 2) exitcode : 1 (pid: 615275) error_file: /tmp/torchelastic_c3f4p7kx/none_e6zdop4t/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam31-ib0 rank : 147 (local_rank: 3) exitcode : 1 (pid: 615276) error_file: /tmp/torchelastic_c3f4p7kx/none_e6zdop4t/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam27-ib0 rank : 124 (local_rank: 4) exitcode : 1 (pid: 350252) error_file: /tmp/torchelastic_xhowux29/none_tw3j0hra/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam27-ib0 rank : 125 (local_rank: 5) exitcode : 1 (pid: 350253) error_file: /tmp/torchelastic_xhowux29/none_tw3j0hra/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam31-ib0 rank : 148 (local_rank: 4) exitcode : 1 (pid: 615277) error_file: /tmp/torchelastic_c3f4p7kx/none_e6zdop4t/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam27-ib0 rank : 126 (local_rank: 6) exitcode : 1 (pid: 350254) error_file: /tmp/torchelastic_xhowux29/none_tw3j0hra/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam31-ib0 rank : 149 (local_rank: 5) exitcode : 1 (pid: 615278) error_file: /tmp/torchelastic_c3f4p7kx/none_e6zdop4t/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam27-ib0 rank : 127 (local_rank: 7) exitcode : 1 (pid: 350255) error_file: /tmp/torchelastic_xhowux29/none_tw3j0hra/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam31-ib0 rank : 150 (local_rank: 6) exitcode : 1 (pid: 615279) error_file: /tmp/torchelastic_c3f4p7kx/none_e6zdop4t/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam27-ib0 rank : 120 (local_rank: 0) exitcode : 1 (pid: 350248) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 error_file: /tmp/torchelastic_xhowux29/none_tw3j0hra/attempt_0/0/error.json traceback : Traceback (most recent call last): [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam31-ib0 rank : 151 (local_rank: 7) exitcode : 1 (pid: 615280) error_file: /tmp/torchelastic_c3f4p7kx/none_e6zdop4t/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam31-ib0 rank : 144 (local_rank: 0) exitcode : 1 (pid: 615273) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ error_file: /tmp/torchelastic_c3f4p7kx/none_e6zdop4t/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( raise ChildFailedError( Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam13-ib0 rank : 73 (local_rank: 1) exitcode : 1 (pid: 2064519) error_file: /tmp/torchelastic_d_0iz7mz/none_h96adn0e/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam52-ib0 rank : 281 (local_rank: 1) exitcode : 1 (pid: 1880227) error_file: /tmp/torchelastic_cfai2ege/none_4bxtjcyc/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam13-ib0 rank : 74 (local_rank: 2) exitcode : 1 (pid: 2064520) error_file: /tmp/torchelastic_d_0iz7mz/none_h96adn0e/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam13-ib0 rank : 75 (local_rank: 3) exitcode : 1 (pid: 2064521) error_file: /tmp/torchelastic_d_0iz7mz/none_h96adn0e/attempt_0/3/error.json traceback : Traceback (most recent call last): [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam52-ib0 rank : 282 (local_rank: 2) exitcode : 1 (pid: 1880228) error_file: /tmp/torchelastic_cfai2ege/none_4bxtjcyc/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam13-ib0 rank : 76 (local_rank: 4) exitcode : 1 (pid: 2064522) error_file: /tmp/torchelastic_d_0iz7mz/none_h96adn0e/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam52-ib0 rank : 283 (local_rank: 3) exitcode : 1 (pid: 1880229) error_file: /tmp/torchelastic_cfai2ege/none_4bxtjcyc/attempt_0/3/error.json traceback : Traceback (most recent call last): return _run_code(code, main_globals, None, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam13-ib0 rank : 77 (local_rank: 5) exitcode : 1 (pid: 2064523) error_file: /tmp/torchelastic_d_0iz7mz/none_h96adn0e/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/runpy.py", line 87, in _run_code File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam52-ib0 rank : 284 (local_rank: 4) exitcode : 1 (pid: 1880230) error_file: /tmp/torchelastic_cfai2ege/none_4bxtjcyc/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam13-ib0 rank : 78 (local_rank: 6) exitcode : 1 (pid: 2064524) error_file: /tmp/torchelastic_d_0iz7mz/none_h96adn0e/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam52-ib0 rank : 285 (local_rank: 5) exitcode : 1 (pid: 1880231) error_file: /tmp/torchelastic_cfai2ege/none_4bxtjcyc/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam13-ib0 rank : 79 (local_rank: 7) exitcode : 1 (pid: 2064525) error_file: /tmp/torchelastic_d_0iz7mz/none_h96adn0e/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam13-ib0 rank : 72 (local_rank: 0) exitcode : 1 (pid: 2064518) batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam52-ib0 rank : 286 (local_rank: 6) exitcode : 1 (pid: 1880232) error_file: /tmp/torchelastic_cfai2ege/none_4bxtjcyc/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( error_file: /tmp/torchelastic_d_0iz7mz/none_h96adn0e/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam52-ib0 rank : 287 (local_rank: 7) exitcode : 1 (pid: 1880233) error_file: /tmp/torchelastic_cfai2ege/none_4bxtjcyc/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam52-ib0 rank : 280 (local_rank: 0) exitcode : 1 (pid: 1880226) exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in error_file: /tmp/torchelastic_cfai2ege/none_4bxtjcyc/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper main() File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main return f(*args, **kwargs) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run run(args) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent elastic_launch( File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( return launch_agent(self._config, self._entrypoint, list(args)) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam15-ib0 rank : 89 (local_rank: 1) exitcode : 1 (pid: 2274414) error_file: /tmp/torchelastic_vnpiq7yo/none_mhrci85p/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( raise ChildFailedError( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam05-ib0 rank : 25 (local_rank: 1) exitcode : 1 (pid: 3121692) error_file: /tmp/torchelastic_y5r2osbl/none_11j5sfru/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam15-ib0 rank : 90 (local_rank: 2) exitcode : 1 (pid: 2274415) error_file: /tmp/torchelastic_vnpiq7yo/none_mhrci85p/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2022-09-05_14:31:20 host : jean-zay-iam44-ib0 rank : 249 (local_rank: 1) exitcode : 1 (pid: 1682117) error_file: /tmp/torchelastic__1zfl8eb/none_6dsdoad7/attempt_0/1/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam15-ib0 rank : 91 (local_rank: 3) exitcode : 1 (pid: 2274416) error_file: /tmp/torchelastic_vnpiq7yo/none_mhrci85p/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam15-ib0 rank : 92 (local_rank: 4) exitcode : 1 (pid: 2274417) error_file: /tmp/torchelastic_vnpiq7yo/none_mhrci85p/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam05-ib0 rank : 26 (local_rank: 2) exitcode : 1 (pid: 3121693) error_file: /tmp/torchelastic_y5r2osbl/none_11j5sfru/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [2]: time : 2022-09-05_14:31:20 host : jean-zay-iam44-ib0 rank : 250 (local_rank: 2) exitcode : 1 (pid: 1682118) error_file: /tmp/torchelastic__1zfl8eb/none_6dsdoad7/attempt_0/2/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam15-ib0 rank : 93 (local_rank: 5) exitcode : 1 (pid: 2274418) error_file: /tmp/torchelastic_vnpiq7yo/none_mhrci85p/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam05-ib0 rank : 27 (local_rank: 3) exitcode : 1 (pid: 3121694) error_file: /tmp/torchelastic_y5r2osbl/none_11j5sfru/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam15-ib0 rank : 94 (local_rank: 6) exitcode : 1 (pid: 2274419) error_file: /tmp/torchelastic_vnpiq7yo/none_mhrci85p/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [3]: time : 2022-09-05_14:31:20 host : jean-zay-iam44-ib0 rank : 251 (local_rank: 3) exitcode : 1 (pid: 1682119) error_file: /tmp/torchelastic__1zfl8eb/none_6dsdoad7/attempt_0/3/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam05-ib0 rank : 28 (local_rank: 4) exitcode : 1 (pid: 3121695) error_file: /tmp/torchelastic_y5r2osbl/none_11j5sfru/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam15-ib0 rank : 95 (local_rank: 7) exitcode : 1 (pid: 2274420) error_file: /tmp/torchelastic_vnpiq7yo/none_mhrci85p/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam15-ib0 rank : 88 (local_rank: 0) exitcode : 1 (pid: 2274413) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [4]: time : 2022-09-05_14:31:20 host : jean-zay-iam44-ib0 rank : 252 (local_rank: 4) exitcode : 1 (pid: 1682120) error_file: /tmp/torchelastic__1zfl8eb/none_6dsdoad7/attempt_0/4/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam05-ib0 rank : 29 (local_rank: 5) exitcode : 1 (pid: 3121696) error_file: /tmp/torchelastic_y5r2osbl/none_11j5sfru/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( error_file: /tmp/torchelastic_vnpiq7yo/none_mhrci85p/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( [5]: time : 2022-09-05_14:31:20 host : jean-zay-iam44-ib0 rank : 253 (local_rank: 5) exitcode : 1 (pid: 1682121) error_file: /tmp/torchelastic__1zfl8eb/none_6dsdoad7/attempt_0/5/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam05-ib0 rank : 30 (local_rank: 6) exitcode : 1 (pid: 3121697) error_file: /tmp/torchelastic_y5r2osbl/none_11j5sfru/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 [6]: time : 2022-09-05_14:31:20 host : jean-zay-iam44-ib0 rank : 254 (local_rank: 6) exitcode : 1 (pid: 1682122) error_file: /tmp/torchelastic__1zfl8eb/none_6dsdoad7/attempt_0/6/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam05-ib0 rank : 31 (local_rank: 7) exitcode : 1 (pid: 3121698) error_file: /tmp/torchelastic_y5r2osbl/none_11j5sfru/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam05-ib0 rank : 24 (local_rank: 0) exitcode : 1 (pid: 3121690) [7]: time : 2022-09-05_14:31:20 host : jean-zay-iam44-ib0 rank : 255 (local_rank: 7) exitcode : 1 (pid: 1682123) error_file: /tmp/torchelastic__1zfl8eb/none_6dsdoad7/attempt_0/7/error.json traceback : Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( error_file: /tmp/torchelastic_y5r2osbl/none_11j5sfru/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-09-05_14:31:20 host : jean-zay-iam44-ib0 rank : 248 (local_rank: 0) exitcode : 1 (pid: 1682116) File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( error_file: /tmp/torchelastic__1zfl8eb/none_6dsdoad7/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ File "/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/finetune_t0.py", line 199, in main pretrain( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 160, in pretrain train_data_iterator, valid_data_iterator, test_data_iterator = build_train_valid_test_data_iterators( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/training.py", line 1207, in build_train_valid_test_data_iterators train_dataloader = build_pretraining_data_loader( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 34, in build_pretraining_data_loader batch_sampler = MegatronPretrainingSampler( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data/data_samplers.py", line 83, in __init__ assert self.consumed_samples < self.total_samples, \ AssertionError: no samples left to consume: 165854565, 12547659 ============================================================ srun: error: jean-zay-iam35: task 22: Exited with exit code 1 srun: launch/slurm: _step_signal: Terminating StepId=1007214.0 srun: error: jean-zay-iam45: task 32: Exited with exit code 1 srun: error: jean-zay-iam11: task 8: Exited with exit code 1 srun: error: jean-zay-iam08: task 6: Exited with exit code 1 srun: error: jean-zay-iam36: task 23: Exited with exit code 1 srun: error: jean-zay-iam47: task 34: Exited with exit code 1 srun: error: jean-zay-iam43: task 30: Exited with exit code 1 srun: error: jean-zay-iam46: task 33: Exited with exit code 1 srun: error: jean-zay-iam42: task 29: Exited with exit code 1 srun: error: jean-zay-iam32: task 19: Exited with exit code 1 srun: error: jean-zay-iam41: task 28: Exited with exit code 1 srun: error: jean-zay-iam44: task 31: Exited with exit code 1 srun: error: jean-zay-iam33: task 20: Exited with exit code 1 srun: error: jean-zay-iam31: task 18: Exited with exit code 1 srun: error: jean-zay-iam14: task 10: Exited with exit code 1 srun: error: jean-zay-iam30: task 17: Exited with exit code 1 srun: error: jean-zay-iam38: task 25: Exited with exit code 1 srun: error: jean-zay-iam40: task 27: Exited with exit code 1 srun: error: jean-zay-iam39: task 26: Exited with exit code 1 srun: error: jean-zay-iam26: task 14: Exited with exit code 1 srun: error: jean-zay-iam27: task 15: Exited with exit code 1 srun: error: jean-zay-iam03: task 1: Exited with exit code 1 srun: error: jean-zay-iam15: task 11: Exited with exit code 1 srun: error: jean-zay-iam05: task 3: Exited with exit code 1 srun: error: jean-zay-iam09: task 7: Exited with exit code 1 srun: error: jean-zay-iam28: task 16: Exited with exit code 1 srun: error: jean-zay-iam04: task 2: Exited with exit code 1 srun: error: jean-zay-iam19: task 13: Exited with exit code 1 srun: error: jean-zay-iam34: task 21: Exited with exit code 1 srun: error: jean-zay-iam18: task 12: Exited with exit code 1 srun: error: jean-zay-iam06: task 4: Exited with exit code 1 srun: error: jean-zay-iam37: task 24: Exited with exit code 1 srun: error: jean-zay-iam07: task 5: Exited with exit code 1 srun: error: jean-zay-iam02: task 0: Exited with exit code 1 srun: error: jean-zay-iam13: task 9: Exited with exit code 1 srun: error: jean-zay-iam52: task 35: Exited with exit code 1 WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:__main__: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** [default0]:using world size: 288, data-parallel-size: 4, tensor-model-parallel size: 1, pipeline-model-parallel size: 72 [default0]:accumulate and all-reduce gradients in fp32 for bfloat16 data type. [default0]:using torch.bfloat16 for parameters ... [default0]:------------------------ arguments ------------------------ [default0]: abort_on_unmet_fused_kernel_constraints ......... True [default0]: accumulate_allreduce_grads_in_fp32 .............. True [default0]: adam_beta1 ...................................... 0.9 [default0]: adam_beta2 ...................................... 0.95 [default0]: adam_eps ........................................ 1e-08 [default0]: adlr_autoresume ................................. False [default0]: adlr_autoresume_interval ........................ 1000 [default0]: apply_query_key_layer_scaling ................... True [default0]: apply_residual_connection_post_layernorm ........ False [default0]: attention_dropout ............................... 0.1 [default0]: attention_softmax_in_fp32 ....................... False [default0]: bert_binary_head ................................ True [default0]: bert_load ....................................... None [default0]: bf16 ............................................ True [default0]: bias_dropout_fusion ............................. True [default0]: bias_gelu_fusion ................................ True [default0]: biencoder_projection_dim ........................ 0 [default0]: biencoder_shared_query_context_model ............ False [default0]: block_data_path ................................. None [default0]: checkpoint_activations .......................... True [default0]: checkpoint_in_cpu ............................... False [default0]: checkpoint_num_layers ........................... 1 [default0]: clip_grad ....................................... 1.0 [default0]: codecarbon_dir .................................. None [default0]: consumed_train_samples .......................... 0 [default0]: consumed_train_tokens ........................... 0 [default0]: consumed_valid_samples .......................... 0 [default0]: contigious_checkpointing ........................ False [default0]: cpu_optimizer ................................... False [default0]: cpu_torch_adam .................................. False [default0]: curriculum_learning ............................. False [default0]: data_impl ....................................... mmap [default0]: data_parallel_size .............................. 4 [default0]: data_path ....................................... None [default0]: dataloader_type ................................. single [default0]: DDP_impl ........................................ local [default0]: decoder_seq_length .............................. None [default0]: deepscale ....................................... False [default0]: deepscale_config ................................ None [default0]: deepspeed ....................................... True [default0]: deepspeed_activation_checkpointing .............. True [default0]: deepspeed_config ................................ ./ds_config.1009240.json [default0]: deepspeed_mpi ................................... False [default0]: distribute_checkpointed_activations ............. False [default0]: distributed_backend ............................. nccl [default0]: embed_layernorm ................................. True [default0]: embedding_path .................................. None [default0]: encoder_seq_length .............................. 2048 [default0]: eod_mask_loss ................................... False [default0]: eval_interval ................................... 250 [default0]: eval_iters ...................................... 10 [default0]: eval_only ....................................... True [default0]: evidence_data_path .............................. None [default0]: exit_duration_in_mins ........................... 5990 [default0]: exit_interval ................................... None [default0]: ffn_hidden_size ................................. 57344 [default0]: finetune ........................................ False [default0]: fp16 ............................................ False [default0]: fp16_lm_cross_entropy ........................... False [default0]: fp32_residual_connection ........................ False [default0]: gigaflos_no_embeds .............................. 0 [default0]: global_batch_size ............................... 2048 [default0]: glu_activation .................................. None [default0]: hidden_dropout .................................. 0.1 [default0]: hidden_size ..................................... 14336 [default0]: hysteresis ...................................... 2 [default0]: ict_head_size ................................... None [default0]: ict_load ........................................ None [default0]: img_dim ......................................... 224 [default0]: indexer_batch_size .............................. 128 [default0]: indexer_log_interval ............................ 1000 [default0]: inference ....................................... False [default0]: init_method_std ................................. 0.0048 [default0]: init_method_xavier_uniform ...................... False [default0]: initial_loss_scale .............................. 4294967296 [default0]: kill_switch_path ................................ /gpfswork/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/kill-switch-tr13-176B-mtf [default0]: kv_channels ..................................... 128 [default0]: layernorm_epsilon ............................... 1e-05 [default0]: lazy_mpu_init ................................... None [default0]: load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: local_rank ...................................... None [default0]: log_batch_size_to_tensorboard ................... True [default0]: log_interval .................................... 1 [default0]: log_learning_rate_to_tensorboard ................ True [default0]: log_level ....................................... None [default0]: log_level_replica ............................... None [default0]: log_loss_scale_to_tensorboard ................... True [default0]: log_num_zeros_in_grad ........................... False [default0]: log_params_norm ................................. False [default0]: log_path ........................................ None [default0]: log_timers_to_tensorboard ....................... True [default0]: log_validation_ppl_to_tensorboard ............... True [default0]: loss_on_targets_only ............................ False [default0]: loss_scale ...................................... None [default0]: loss_scale_window ............................... 1000 [default0]: lr .............................................. 2e-05 [default0]: lr_decay_iters .................................. None [default0]: lr_decay_samples ................................ None [default0]: lr_decay_style .................................. constant [default0]: lr_decay_tokens ................................. None [default0]: lr_warmup_fraction .............................. None [default0]: lr_warmup_iters ................................. 0 [default0]: lr_warmup_samples ............................... 0 [default0]: make_vocab_size_divisible_by .................... 128 [default0]: mask_prob ....................................... 0.15 [default0]: masked_softmax_fusion ........................... True [default0]: max_position_embeddings ......................... 2048 [default0]: mean_noise_span_length .......................... None [default0]: memory_centric_tiled_linear ..................... False [default0]: merge_file ...................................... None [default0]: micro_batch_size ................................ 1 [default0]: min_loss_scale .................................. 1.0 [default0]: min_lr .......................................... 0.0 [default0]: mmap_warmup ..................................... False [default0]: no_load_optim ................................... True [default0]: no_load_rng ..................................... None [default0]: no_save_optim ................................... None [default0]: no_save_rng ..................................... None [default0]: noise_density ................................... None [default0]: norm_target_loss ................................ True [default0]: num_attention_heads ............................. 112 [default0]: num_channels .................................... 3 [default0]: num_classes ..................................... 1000 [default0]: num_layers ...................................... 70 [default0]: num_layers_per_virtual_pipeline_stage ........... None [default0]: num_workers ..................................... 2 [default0]: onnx_safe ....................................... None [default0]: openai_gelu ..................................... False [default0]: optimizer ....................................... adam [default0]: override_lr_scheduler ........................... False [default0]: pad_vocab_size_to ............................... 250880 [default0]: params_dtype .................................... torch.bfloat16 [default0]: partition_activations ........................... False [default0]: patch_dim ....................................... 16 [default0]: pipeline_model_parallel_size .................... 72 [default0]: position_embedding_type ......................... PositionEmbeddingType.alibi [default0]: pp_partition_method ............................. type:transformer|embedding [default0]: prefixlm ........................................ False [default0]: profile_backward ................................ False [default0]: query_in_block_prob ............................. 0.1 [default0]: rampup_batch_size ............................... None [default0]: rank ............................................ 0 [default0]: remote_device ................................... none [default0]: reset_attention_mask ............................ False [default0]: reset_position_ids .............................. False [default0]: reset_progress .................................. True [default0]: retriever_report_topk_accuracies ................ [] [default0]: retriever_score_scaling ......................... False [default0]: retriever_seq_length ............................ 256 [default0]: reweight_loss_based_on_position_frequency ....... False [default0]: sample_rate ..................................... 1.0 [default0]: save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq [default0]: save_interval ................................... 5 [default0]: scatter_gather_tensors_in_pipeline .............. True [default0]: scattered_embeddings ............................ False [default0]: seed ............................................ 42 [default0]: seq_length ...................................... 2048 [default0]: sgd_momentum .................................... 0.9 [default0]: short_seq_prob .................................. 0.1 [default0]: skip_train_iteration_range ...................... None [default0]: split ........................................... None [default0]: split_transformers .............................. False [default0]: sync_tp_duplicated_parameters ................... True [default0]: synchronize_each_layer .......................... False [default0]: tensor_model_parallel_size ...................... 1 [default0]: tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/tr13-176B-ml-t0-logs/tensorboard/p31lossseq [default0]: tensorboard_log_interval ........................ 1 [default0]: tensorboard_queue_size .......................... 5 [default0]: test_weighted_split_paths ....................... None [default0]: test_weighted_split_paths_path .................. None [default0]: tile_factor ..................................... 1 [default0]: titles_data_path ................................ None [default0]: tokenizer_name_or_path .......................... bigscience/tokenizer [default0]: tokenizer_type .................................. PretrainedFromHF [default0]: train_iters ..................................... None [default0]: train_samples ................................... 6348800 [default0]: train_tokens .................................... None [default0]: train_weighted_split_names ...................... ['train'] [default0]: train_weighted_split_paths ...................... [['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train']] [default0]: train_weighted_split_paths_path ................. None [default0]: train_weighted_split_splits ..................... [['0:1']] [default0]: train_weighted_split_weights .................... [['1']] [default0]: universal_checkpoint ............................ True [default0]: use_bnb_optimizer ............................... False [default0]: use_checkpoint_lr_scheduler ..................... False [default0]: use_contiguous_buffers_in_ddp ................... True [default0]: use_cpu_initialization .......................... None [default0]: use_one_sent_docs ............................... False [default0]: use_pin_memory .................................. False [default0]: valid_num_workers ............................... 2 [default0]: valid_weighted_split_names ...................... ['validation_pretraining', 'valid'] [default0]: valid_weighted_split_paths ...................... [['/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document', '/gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document'], ['/gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation']] [default0]: valid_weighted_split_paths_path ................. None [default0]: valid_weighted_split_splits ..................... [['0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0', '0.950:1.0'], ['0:1']] [default0]: valid_weighted_split_weights .................... [['0.0330676168743166', '0.011242051312222764', '0.13027200903379185', '0.22171164529099704', '0.10667815627928671', '0.0015595123898173287', '0.13054018439603915', '0.01091803753667153', '0.00011021422347108609', '0.005492381453597748', '0.0004021215011318779', '0.007470068593492175', '0.0006190467776576425', '0.0010335296343329384', '0.0005012010684646179', '0.0006672772956128299', '0.00035928138344705506', '0.0005084433130291778', '0.0021137328219915496', '0.0009129946225980253', '0.0012454301613725426', '0.00031588689199263235', '0.08137213783015229', '0.055293935695898196', '0.04954150576361177', '0.02461641286531197', '0.12091748245519074', '0.0005177025345001541'], ['1']] [default0]: virtual_pipeline_model_parallel_size ............ None [default0]: vocab_extra_ids ................................. 0 [default0]: vocab_file ...................................... None [default0]: weight_decay .................................... 0.0001 [default0]: world_size ...................................... 288 [default0]: zero_allgather_bucket_size ...................... 0.0 [default0]: zero_contigious_gradients ....................... False [default0]: zero_reduce_bucket_size ......................... 0.0 [default0]: zero_reduce_scatter ............................. False [default0]: zero_stage ...................................... 0 [default0]:-------------------- end of arguments --------------------- [default0]:setting number of micro-batches to constant 512 [default0]:> building PretrainedFromHF tokenizer ... [default0]: vocab file is un-used. loading tokenizer from pre-trained model [default0]:Offline mode: forcing local_files_only=True [default0]:Offline mode: forcing local_files_only=True [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer.json from cache at /gpfswork/rech/six/commun/models/29d0a41f4527257b8afe6d5495f492dac260318430f18239a42ca5f6dc4487fc.7b0fb8edc2986944ff9b7418149b52d8c4a1354a17d0360deb8974da70c6cc03 [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/added_tokens.json from cache at None [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/special_tokens_map.json from cache at /gpfswork/rech/six/commun/models/4f03e43bcc54e0721823e6a06b1d197905e2ea79aa7dcc1a0f0fcecc73ce3fb2.9d6cd81ef646692fb1c169a880161ea1cb95f49694f220aced9b704b457e51dd [default0]:loading file https://huggingface.co/bigscience/tokenizer/resolve/main/tokenizer_config.json from cache at /gpfswork/rech/six/commun/models/9441c67b923ef7a65950a64e31c40f80ed181ba59502981a80f2cd0c438c6432.3c09887250243e50d8de9d10b2a778152434f62a22a95b5f89dbbe79a6eb496a [default7]:> setting tensorboard ... [default0]: > padded vocab (size: 250680) with 200 dummy tokens (new size: 250880) [default0]:DeepSpeed general environment info: [default0]:torch install path ............... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/torch'] [default0]:torch version .................... 1.12.0 [default0]:torch cuda version ............... 11.3 [default0]:torch hip version ................ None [default0]:nvcc version ..................... 11.4 [default0]:deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr13f-6B3-ml-t0/lib/python3.8/site-packages/deepspeed'] [default0]:deepspeed info ................... 0.7.1+8b2a6371, 8b2a6371, master [default0]:deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3 [default0]:**** Git info for Megatron: git_hash=6c1018f git_branch=mtf-multival **** [default0]:> initializing torch distributed ... [default0]:[2022-09-05 14:34:21,050] [INFO] [comm.py:628:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [default0]:> initializing tensor model parallel with size 1 [default0]:> initializing pipeline model parallel with size 72 [default0]:> setting random seeds to 42 ... [default0]:[2022-09-05 14:34:30,013] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 [default0]:> compiling dataset index builder ... [default0]:make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:make: Nothing to be done for 'default'. [default0]:make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/data' [default0]:>>> done with dataset index builder. Compilation time: 0.090 seconds [default0]:> compiling and loading fused kernels ... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_upper_triang_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module scaled_masked_softmax_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module scaled_masked_softmax_cuda... [default0]:Detected CUDA files, patching ldflags [default0]:Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... [default0]:Building extension module fused_mix_prec_layer_norm_cuda... [default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default0]:ninja: no work to do. [default0]:Loading extension module fused_mix_prec_layer_norm_cuda... [default0]:>>> done with compiling and loading fused kernels. Compilation time: 6.765 seconds [default0]:time to initialize megatron (seconds): -35.129 [default0]:[after megatron is initialized] datetime: 2022-09-05 14:34:36 [default0]:building GPT model ... [default0]:[2022-09-05 14:34:36,920] [INFO] [utils.py:827:see_memory_usage] Before Building Model [default0]:[2022-09-05 14:34:36,921] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [default0]:[2022-09-05 14:34:36,921] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.07 GB, percent = 7.2% [default0]:SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None [default0]:Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=1, model=0): 5, ProcessCoord(pipe=1, data=2, model=0): 6, ProcessCoord(pipe=1, data=3, model=0): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=1, model=0): 9, ProcessCoord(pipe=2, data=2, model=0): 10, ProcessCoord(pipe=2, data=3, model=0): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=1, model=0): 13, ProcessCoord(pipe=3, data=2, model=0): 14, ProcessCoord(pipe=3, data=3, model=0): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=1, model=0): 17, ProcessCoord(pipe=4, data=2, model=0): 18, ProcessCoord(pipe=4, data=3, model=0): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=1, model=0): 21, ProcessCoord(pipe=5, data=2, model=0): 22, ProcessCoord(pipe=5, data=3, model=0): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=1, model=0): 25, ProcessCoord(pipe=6, data=2, model=0): 26, ProcessCoord(pipe=6, data=3, model=0): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=1, model=0): 29, ProcessCoord(pipe=7, data=2, model=0): 30, ProcessCoord(pipe=7, data=3, model=0): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=1, model=0): 33, ProcessCoord(pipe=8, data=2, model=0): 34, ProcessCoord(pipe=8, data=3, model=0): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=1, model=0): 37, ProcessCoord(pipe=9, data=2, model=0): 38, ProcessCoord(pipe=9, data=3, model=0): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=1, model=0): 41, ProcessCoord(pipe=10, data=2, model=0): 42, ProcessCoord(pipe=10, data=3, model=0): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=1, model=0): 45, ProcessCoord(pipe=11, data=2, model=0): 46, ProcessCoord(pipe=11, data=3, model=0): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=1, model=0): 49, ProcessCoord(pipe=12, data=2, model=0): 50, ProcessCoord(pipe=12, data=3, model=0): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=1, model=0): 53, ProcessCoord(pipe=13, data=2, model=0): 54, ProcessCoord(pipe=13, data=3, model=0): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=1, model=0): 57, ProcessCoord(pipe=14, data=2, model=0): 58, ProcessCoord(pipe=14, data=3, model=0): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=1, model=0): 61, ProcessCoord(pipe=15, data=2, model=0): 62, ProcessCoord(pipe=15, data=3, model=0): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=1, model=0): 65, ProcessCoord(pipe=16, data=2, model=0): 66, ProcessCoord(pipe=16, data=3, model=0): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=1, model=0): 69, ProcessCoord(pipe=17, data=2, model=0): 70, ProcessCoord(pipe=17, data=3, model=0): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=1, model=0): 73, ProcessCoord(pipe=18, data=2, model=0): 74, ProcessCoord(pipe=18, data=3, model=0): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=1, model=0): 77, ProcessCoord(pipe=19, data=2, model=0): 78, ProcessCoord(pipe=19, data=3, model=0): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=1, model=0): 81, ProcessCoord(pipe=20, data=2, model=0): 82, ProcessCoord(pipe=20, data=3, model=0): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=1, model=0): 85, ProcessCoord(pipe=21, data=2, model=0): 86, ProcessCoord(pipe=21, data=3, model=0): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=1, model=0): 89, ProcessCoord(pipe=22, data=2, model=0): 90, ProcessCoord(pipe=22, data=3, model=0): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=1, model=0): 93, ProcessCoord(pipe=23, data=2, model=0): 94, ProcessCoord(pipe=23, data=3, model=0): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=1, model=0): 97, ProcessCoord(pipe=24, data=2, model=0): 98, ProcessCoord(pipe=24, data=3, model=0): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=1, model=0): 101, ProcessCoord(pipe=25, data=2, model=0): 102, ProcessCoord(pipe=25, data=3, model=0): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=1, model=0): 105, ProcessCoord(pipe=26, data=2, model=0): 106, ProcessCoord(pipe=26, data=3, model=0): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=1, model=0): 109, ProcessCoord(pipe=27, data=2, model=0): 110, ProcessCoord(pipe=27, data=3, model=0): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=1, model=0): 113, ProcessCoord(pipe=28, data=2, model=0): 114, ProcessCoord(pipe=28, data=3, model=0): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=1, model=0): 117, ProcessCoord(pipe=29, data=2, model=0): 118, ProcessCoord(pipe=29, data=3, model=0): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=1, model=0): 121, ProcessCoord(pipe=30, data=2, model=0): 122, ProcessCoord(pipe=30, data=3, model=0): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=1, model=0): 125, ProcessCoord(pipe=31, data=2, model=0): 126, ProcessCoord(pipe=31, data=3, model=0): 127, ProcessCoord(pipe=32, data=0, model=0): 128, ProcessCoord(pipe=32, data=1, model=0): 129, ProcessCoord(pipe=32, data=2, model=0): 130, ProcessCoord(pipe=32, data=3, model=0): 131, ProcessCoord(pipe=33, data=0, model=0): 132, ProcessCoord(pipe=33, data=1, model=0): 133, ProcessCoord(pipe=33, data=2, model=0): 134, ProcessCoord(pipe=33, data=3, model=0): 135, ProcessCoord(pipe=34, data=0, model=0): 136, ProcessCoord(pipe=34, data=1, model=0): 137, ProcessCoord(pipe=34, data=2, model=0): 138, ProcessCoord(pipe=34, data=3, model=0): 139, ProcessCoord(pipe=35, data=0, model=0): 140, ProcessCoord(pipe=35, data=1, model=0): 141, ProcessCoord(pipe=35, data=2, model=0): 142, ProcessCoord(pipe=35, data=3, model=0): 143, ProcessCoord(pipe=36, data=0, model=0): 144, ProcessCoord(pipe=36, data=1, model=0): 145, ProcessCoord(pipe=36, data=2, model=0): 146, ProcessCoord(pipe=36, data=3, model=0): 147, ProcessCoord(pipe=37, data=0, model=0): 148, ProcessCoord(pipe=37, data=1, model=0): 149, ProcessCoord(pipe=37, data=2, model=0): 150, ProcessCoord(pipe=37, data=3, model=0): 151, ProcessCoord(pipe=38, data=0, model=0): 152, ProcessCoord(pipe=38, data=1, model=0): 153, ProcessCoord(pipe=38, data=2, model=0): 154, ProcessCoord(pipe=38, data=3, model=0): 155, ProcessCoord(pipe=39, data=0, model=0): 156, ProcessCoord(pipe=39, data=1, model=0): 157, ProcessCoord(pipe=39, data=2, model=0): 158, ProcessCoord(pipe=39, data=3, model=0): 159, ProcessCoord(pipe=40, data=0, model=0): 160, ProcessCoord(pipe=40, data=1, model=0): 161, ProcessCoord(pipe=40, data=2, model=0): 162, ProcessCoord(pipe=40, data=3, model=0): 163, ProcessCoord(pipe=41, data=0, model=0): 164, ProcessCoord(pipe=41, data=1, model=0): 165, ProcessCoord(pipe=41, data=2, model=0): 166, ProcessCoord(pipe=41, data=3, model=0): 167, ProcessCoord(pipe=42, data=0, model=0): 168, ProcessCoord(pipe=42, data=1, model=0): 169, ProcessCoord(pipe=42, data=2, model=0): 170, ProcessCoord(pipe=42, data=3, model=0): 171, ProcessCoord(pipe=43, data=0, model=0): 172, ProcessCoord(pipe=43, data=1, model=0): 173, ProcessCoord(pipe=43, data=2, model=0): 174, ProcessCoord(pipe=43, data=3, model=0): 175, ProcessCoord(pipe=44, data=0, model=0): 176, ProcessCoord(pipe=44, data=1, model=0): 177, ProcessCoord(pipe=44, data=2, model=0): 178, ProcessCoord(pipe=44, data=3, model=0): 179, ProcessCoord(pipe=45, data=0, model=0): 180, ProcessCoord(pipe=45, data=1, model=0): 181, ProcessCoord(pipe=45, data=2, model=0): 182, ProcessCoord(pipe=45, data=3, model=0): 183, ProcessCoord(pipe=46, data=0, model=0): 184, ProcessCoord(pipe=46, data=1, model=0): 185, ProcessCoord(pipe=46, data=2, model=0): 186, ProcessCoord(pipe=46, data=3, model=0): 187, ProcessCoord(pipe=47, data=0, model=0): 188, ProcessCoord(pipe=47, data=1, model=0): 189, ProcessCoord(pipe=47, data=2, model=0): 190, ProcessCoord(pipe=47, data=3, model=0): 191, ProcessCoord(pipe=48, data=0, model=0): 192, ProcessCoord(pipe=48, data=1, model=0): 193, ProcessCoord(pipe=48, data=2, model=0): 194, ProcessCoord(pipe=48, data=3, model=0): 195, ProcessCoord(pipe=49, data=0, model=0): 196, ProcessCoord(pipe=49, data=1, model=0): 197, ProcessCoord(pipe=49, data=2, model=0): 198, ProcessCoord(pipe=49, data=3, model=0): 199, ProcessCoord(pipe=50, data=0, model=0): 200, ProcessCoord(pipe=50, data=1, model=0): 201, ProcessCoord(pipe=50, data=2, model=0): 202, ProcessCoord(pipe=50, data=3, model=0): 203, ProcessCoord(pipe=51, data=0, model=0): 204, ProcessCoord(pipe=51, data=1, model=0): 205, ProcessCoord(pipe=51, data=2, model=0): 206, ProcessCoord(pipe=51, data=3, model=0): 207, ProcessCoord(pipe=52, data=0, model=0): 208, ProcessCoord(pipe=52, data=1, model=0): 209, ProcessCoord(pipe=52, data=2, model=0): 210, ProcessCoord(pipe=52, data=3, model=0): 211, ProcessCoord(pipe=53, data=0, model=0): 212, ProcessCoord(pipe=53, data=1, model=0): 213, ProcessCoord(pipe=53, data=2, model=0): 214, ProcessCoord(pipe=53, data=3, model=0): 215, ProcessCoord(pipe=54, data=0, model=0): 216, ProcessCoord(pipe=54, data=1, model=0): 217, ProcessCoord(pipe=54, data=2, model=0): 218, ProcessCoord(pipe=54, data=3, model=0): 219, ProcessCoord(pipe=55, data=0, model=0): 220, ProcessCoord(pipe=55, data=1, model=0): 221, ProcessCoord(pipe=55, data=2, model=0): 222, ProcessCoord(pipe=55, data=3, model=0): 223, ProcessCoord(pipe=56, data=0, model=0): 224, ProcessCoord(pipe=56, data=1, model=0): 225, ProcessCoord(pipe=56, data=2, model=0): 226, ProcessCoord(pipe=56, data=3, model=0): 227, ProcessCoord(pipe=57, data=0, model=0): 228, ProcessCoord(pipe=57, data=1, model=0): 229, ProcessCoord(pipe=57, data=2, model=0): 230, ProcessCoord(pipe=57, data=3, model=0): 231, ProcessCoord(pipe=58, data=0, model=0): 232, ProcessCoord(pipe=58, data=1, model=0): 233, ProcessCoord(pipe=58, data=2, model=0): 234, ProcessCoord(pipe=58, data=3, model=0): 235, ProcessCoord(pipe=59, data=0, model=0): 236, ProcessCoord(pipe=59, data=1, model=0): 237, ProcessCoord(pipe=59, data=2, model=0): 238, ProcessCoord(pipe=59, data=3, model=0): 239, ProcessCoord(pipe=60, data=0, model=0): 240, ProcessCoord(pipe=60, data=1, model=0): 241, ProcessCoord(pipe=60, data=2, model=0): 242, ProcessCoord(pipe=60, data=3, model=0): 243, ProcessCoord(pipe=61, data=0, model=0): 244, ProcessCoord(pipe=61, data=1, model=0): 245, ProcessCoord(pipe=61, data=2, model=0): 246, ProcessCoord(pipe=61, data=3, model=0): 247, ProcessCoord(pipe=62, data=0, model=0): 248, ProcessCoord(pipe=62, data=1, model=0): 249, ProcessCoord(pipe=62, data=2, model=0): 250, ProcessCoord(pipe=62, data=3, model=0): 251, ProcessCoord(pipe=63, data=0, model=0): 252, ProcessCoord(pipe=63, data=1, model=0): 253, ProcessCoord(pipe=63, data=2, model=0): 254, ProcessCoord(pipe=63, data=3, model=0): 255, ProcessCoord(pipe=64, data=0, model=0): 256, ProcessCoord(pipe=64, data=1, model=0): 257, ProcessCoord(pipe=64, data=2, model=0): 258, ProcessCoord(pipe=64, data=3, model=0): 259, ProcessCoord(pipe=65, data=0, model=0): 260, ProcessCoord(pipe=65, data=1, model=0): 261, ProcessCoord(pipe=65, data=2, model=0): 262, ProcessCoord(pipe=65, data=3, model=0): 263, ProcessCoord(pipe=66, data=0, model=0): 264, ProcessCoord(pipe=66, data=1, model=0): 265, ProcessCoord(pipe=66, data=2, model=0): 266, ProcessCoord(pipe=66, data=3, model=0): 267, ProcessCoord(pipe=67, data=0, model=0): 268, ProcessCoord(pipe=67, data=1, model=0): 269, ProcessCoord(pipe=67, data=2, model=0): 270, ProcessCoord(pipe=67, data=3, model=0): 271, ProcessCoord(pipe=68, data=0, model=0): 272, ProcessCoord(pipe=68, data=1, model=0): 273, ProcessCoord(pipe=68, data=2, model=0): 274, ProcessCoord(pipe=68, data=3, model=0): 275, ProcessCoord(pipe=69, data=0, model=0): 276, ProcessCoord(pipe=69, data=1, model=0): 277, ProcessCoord(pipe=69, data=2, model=0): 278, ProcessCoord(pipe=69, data=3, model=0): 279, ProcessCoord(pipe=70, data=0, model=0): 280, ProcessCoord(pipe=70, data=1, model=0): 281, ProcessCoord(pipe=70, data=2, model=0): 282, ProcessCoord(pipe=70, data=3, model=0): 283, ProcessCoord(pipe=71, data=0, model=0): 284, ProcessCoord(pipe=71, data=1, model=0): 285, ProcessCoord(pipe=71, data=2, model=0): 286, ProcessCoord(pipe=71, data=3, model=0): 287} [default0]:[2022-09-05 14:34:40,789] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer|embedding [default0]:stage=0 layers=3 [default0]: 0: _to_float16 [default0]: 1: EmbeddingPipe [default0]: 2: [default0]:stage=1 layers=1 [default0]: 3: ParallelTransformerLayerPipe [default0]:stage=2 layers=1 [default0]: 4: ParallelTransformerLayerPipe [default0]:stage=3 layers=1 [default0]: 5: ParallelTransformerLayerPipe [default0]:stage=4 layers=1 [default0]: 6: ParallelTransformerLayerPipe [default0]:stage=5 layers=1 [default0]: 7: ParallelTransformerLayerPipe [default0]:stage=6 layers=1 [default0]: 8: ParallelTransformerLayerPipe [default0]:stage=7 layers=1 [default0]: 9: ParallelTransformerLayerPipe [default0]:stage=8 layers=1 [default0]: 10: ParallelTransformerLayerPipe [default0]:stage=9 layers=1 [default0]: 11: ParallelTransformerLayerPipe [default0]:stage=10 layers=1 [default0]: 12: ParallelTransformerLayerPipe [default0]:stage=11 layers=1 [default0]: 13: ParallelTransformerLayerPipe [default0]:stage=12 layers=1 [default0]: 14: ParallelTransformerLayerPipe [default0]:stage=13 layers=1 [default0]: 15: ParallelTransformerLayerPipe [default0]:stage=14 layers=1 [default0]: 16: ParallelTransformerLayerPipe [default0]:stage=15 layers=1 [default0]: 17: ParallelTransformerLayerPipe [default0]:stage=16 layers=1 [default0]: 18: ParallelTransformerLayerPipe [default0]:stage=17 layers=1 [default0]: 19: ParallelTransformerLayerPipe [default0]:stage=18 layers=1 [default0]: 20: ParallelTransformerLayerPipe [default0]:stage=19 layers=1 [default0]: 21: ParallelTransformerLayerPipe [default0]:stage=20 layers=1 [default0]: 22: ParallelTransformerLayerPipe [default0]:stage=21 layers=1 [default0]: 23: ParallelTransformerLayerPipe [default0]:stage=22 layers=1 [default0]: 24: ParallelTransformerLayerPipe [default0]:stage=23 layers=1 [default0]: 25: ParallelTransformerLayerPipe [default0]:stage=24 layers=1 [default0]: 26: ParallelTransformerLayerPipe [default0]:stage=25 layers=1 [default0]: 27: ParallelTransformerLayerPipe [default0]:stage=26 layers=1 [default0]: 28: ParallelTransformerLayerPipe [default0]:stage=27 layers=1 [default0]: 29: ParallelTransformerLayerPipe [default0]:stage=28 layers=1 [default0]: 30: ParallelTransformerLayerPipe [default0]:stage=29 layers=1 [default0]: 31: ParallelTransformerLayerPipe [default0]:stage=30 layers=1 [default0]: 32: ParallelTransformerLayerPipe [default0]:stage=31 layers=1 [default0]: 33: ParallelTransformerLayerPipe [default0]:stage=32 layers=1 [default0]: 34: ParallelTransformerLayerPipe [default0]:stage=33 layers=1 [default0]: 35: ParallelTransformerLayerPipe [default0]:stage=34 layers=1 [default0]: 36: ParallelTransformerLayerPipe [default0]:stage=35 layers=1 [default0]: 37: ParallelTransformerLayerPipe [default0]:stage=36 layers=1 [default0]: 38: ParallelTransformerLayerPipe [default0]:stage=37 layers=1 [default0]: 39: ParallelTransformerLayerPipe [default0]:stage=38 layers=1 [default0]: 40: ParallelTransformerLayerPipe [default0]:stage=39 layers=1 [default0]: 41: ParallelTransformerLayerPipe [default0]:stage=40 layers=1 [default0]: 42: ParallelTransformerLayerPipe [default0]:stage=41 layers=1 [default0]: 43: ParallelTransformerLayerPipe [default0]:stage=42 layers=1 [default0]: 44: ParallelTransformerLayerPipe [default0]:stage=43 layers=1 [default0]: 45: ParallelTransformerLayerPipe [default0]:stage=44 layers=1 [default0]: 46: ParallelTransformerLayerPipe [default0]:stage=45 layers=1 [default0]: 47: ParallelTransformerLayerPipe [default0]:stage=46 layers=1 [default0]: 48: ParallelTransformerLayerPipe [default0]:stage=47 layers=1 [default0]: 49: ParallelTransformerLayerPipe [default0]:stage=48 layers=1 [default0]: 50: ParallelTransformerLayerPipe [default0]:stage=49 layers=1 [default0]: 51: ParallelTransformerLayerPipe [default0]:stage=50 layers=1 [default0]: 52: ParallelTransformerLayerPipe [default0]:stage=51 layers=1 [default0]: 53: ParallelTransformerLayerPipe [default0]:stage=52 layers=1 [default0]: 54: ParallelTransformerLayerPipe [default0]:stage=53 layers=1 [default0]: 55: ParallelTransformerLayerPipe [default0]:stage=54 layers=1 [default0]: 56: ParallelTransformerLayerPipe [default0]:stage=55 layers=1 [default0]: 57: ParallelTransformerLayerPipe [default0]:stage=56 layers=1 [default0]: 58: ParallelTransformerLayerPipe [default0]:stage=57 layers=1 [default0]: 59: ParallelTransformerLayerPipe [default0]:stage=58 layers=1 [default0]: 60: ParallelTransformerLayerPipe [default0]:stage=59 layers=1 [default0]: 61: ParallelTransformerLayerPipe [default0]:stage=60 layers=1 [default0]: 62: ParallelTransformerLayerPipe [default0]:stage=61 layers=1 [default0]: 63: ParallelTransformerLayerPipe [default0]:stage=62 layers=1 [default0]: 64: ParallelTransformerLayerPipe [default0]:stage=63 layers=1 [default0]: 65: ParallelTransformerLayerPipe [default0]:stage=64 layers=1 [default0]: 66: ParallelTransformerLayerPipe [default0]:stage=65 layers=1 [default0]: 67: ParallelTransformerLayerPipe [default0]:stage=66 layers=1 [default0]: 68: ParallelTransformerLayerPipe [default0]:stage=67 layers=1 [default0]: 69: ParallelTransformerLayerPipe [default0]:stage=68 layers=1 [default0]: 70: ParallelTransformerLayerPipe [default0]:stage=69 layers=1 [default0]: 71: ParallelTransformerLayerPipe [default0]:stage=70 layers=3 [default0]: 72: ParallelTransformerLayerPipe [default0]: 73: undo [default0]: 74: MixedFusedLayerNorm [default0]:stage=71 layers=2 [default0]: 75: EmbeddingPipe [default0]: 76: float16_to_fp32 [default0]: loss: CrossEntropy [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default7]:Building extension module utils... [default7]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.33902740478515625 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.339141845703125 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.33951473236083984 seconds [default7]:ninja: no work to do. [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3023979663848877 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3283090591430664 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3281819820404053 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3284413814544678 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3282942771911621 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.000537872314453125 seconds [default6]:Time to load utils op: 0.0005209445953369141 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005154609680175781 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00042819976806640625 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0009112358093261719 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0008270740509033203 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0008294582366943359 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008313655853271484 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default2]:Building extension module utils... [default2]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20645952224731445 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2034616470336914 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2026677131652832 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20273423194885254 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2064223289489746 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2063589096069336 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20646119117736816 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20315051078796387 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20323538780212402 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20313262939453125 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20334434509277344 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20352697372436523 seconds [default0]:Loading extension module utils... [default2]:Loading extension module utils... [default1]:Loading extension module utils... [default6]:Loading extension module utils... [default4]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2046806812286377 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2046668529510498 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20455217361450195 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20467758178710938 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20543408393859863 seconds [default2]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20240044593811035 seconds [default2]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2053842544555664 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20230603218078613 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20550203323364258 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20545721054077148 seconds [default3]:Loading extension module utils... [default7]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Loading extension module utils... [default6]:Loading extension module utils... [default5]:Loading extension module utils... [default4]:Loading extension module utils... [default4]:Loading extension module utils... [default3]:Loading extension module utils... [default5]:Loading extension module utils... [default2]:ninja: no work to do. [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2523202896118164 seconds [default7]:Loading extension module utils... [default4]:Loading extension module utils... [default5]:Loading extension module utils... [default4]:Loading extension module utils... [default7]:Loading extension module utils... [default5]:Loading extension module utils... [default4]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2402346134185791 seconds [default7]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20684218406677246 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20685958862304688 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20692896842956543 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20691227912902832 seconds [default7]:Loading extension module utils... [default0]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20255804061889648 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20249128341674805 seconds [default5]:Loading extension module utils... [default6]:Loading extension module utils... [default7]:Loading extension module utils... [default4]:Loading extension module utils... [default5]:Loading extension module utils... [default1]:Loading extension module utils... [default6]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20248889923095703 seconds [default7]:Loading extension module utils... [default3]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2414088249206543 seconds [default2]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.24544000625610352 seconds [default4]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.24121356010437012 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20256543159484863 seconds [default2]:Loading extension module utils... [default6]:Loading extension module utils... [default7]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20908761024475098 seconds [default2]:Time to load utils op: 0.2090749740600586 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.2090747356414795 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2028484344482422 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20322895050048828 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2025761604309082 seconds [default0]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20781636238098145 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20774292945861816 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21803498268127441 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21799087524414062 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2186565399169922 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20257282257080078 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20274996757507324 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21856021881103516 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20766806602478027 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20859456062316895 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.22925019264221191 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.22924590110778809 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.22924399375915527 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.21990513801574707 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.22061443328857422 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.22030377388000488 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.22134923934936523 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2202761173248291 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.22923779487609863 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2077333927154541 seconds [default1]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20757508277893066 seconds [default1]:Time to load utils op: 0.20851397514343262 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20907998085021973 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20270204544067383 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20292305946350098 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.22047162055969238 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.22113323211669922 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.219893217086792 seconds [default0]:Loading extension module utils... [default3]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20547866821289062 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20538067817687988 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20548295974731445 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20538902282714844 seconds [default0]:[2022-09-05 14:34:42,528] [INFO] [utils.py:827:see_memory_usage] After Building Model [default0]:[2022-09-05 14:34:42,529] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-05 14:34:42,529] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.45 GB, percent = 7.2% [default0]:setting training iterations to 3100 [default0]:> learning rate decay style: constant [default0]:DeepSpeed is enabled. [default0]:[2022-09-05 14:34:42,530] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.1+8b2a6371, git-hash=8b2a6371, git-branch=master [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20255827903747559 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20221829414367676 seconds [default0]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20262646675109863 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20227742195129395 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3061995506286621 seconds [default0]:Time to load utils op: 0.202592134475708 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3061394691467285 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20271039009094238 seconds [default1]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20730113983154297 seconds [default1]:Time to load utils op: 0.20728445053100586 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20241856575012207 seconds [default0]:Loading extension module utils... [default1]:Loading extension module utils... [default0]:Time to load utils op: 0.20650625228881836 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3051338195800781 seconds [default1]:Time to load utils op: 0.20603203773498535 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3053727149963379 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30532240867614746 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3051419258117676 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20474863052368164 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20272254943847656 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.205366849899292 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20229053497314453 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20789313316345215 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20254850387573242 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2038891315460205 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2038276195526123 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3122429847717285 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3121979236602783 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20788192749023438 seconds [default6]:Loading extension module utils... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2214524745941162 seconds [default6]:Time to load utils op: 0.22144794464111328 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20541739463806152 seconds [default0]:Time to load utils op: 0.23442339897155762 seconds [default2]:Time to load utils op: 0.23438644409179688 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20725584030151367 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20725321769714355 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20514202117919922 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20524072647094727 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20711731910705566 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20588946342468262 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20691943168640137 seconds [default1]:Time to load utils op: 0.23442959785461426 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20693373680114746 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20727849006652832 seconds [default5]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20709013938903809 seconds [default5]:Time to load utils op: 0.2071514129638672 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2072441577911377 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20694732666015625 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3121917247772217 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30620384216308594 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.31208372116088867 seconds [default4]:Time to load utils op: 0.22374296188354492 seconds [default6]:Time to load utils op: 0.22278666496276855 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2214515209197998 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20227289199829102 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20239806175231934 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20261025428771973 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20248937606811523 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2040424346923828 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20601105690002441 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20359253883361816 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20291566848754883 seconds [default2]:Time to load utils op: 0.20591282844543457 seconds [default0]:Time to load utils op: 0.2066357135772705 seconds [default2]:Time to load utils op: 0.2023448944091797 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3061206340789795 seconds [default3]:Time to load utils op: 0.2025744915008545 seconds [default7]:Time to load utils op: 0.2214643955230713 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.3053164482116699 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.31381821632385254 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20250296592712402 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20297622680664062 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20252561569213867 seconds [default4]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2028825283050537 seconds [default4]:Time to load utils op: 0.30532288551330566 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20830774307250977 seconds [default3]:Time to load utils op: 0.2344348430633545 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20594048500061035 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2082843780517578 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2082808017730713 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20244145393371582 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2026691436767578 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.3051154613494873 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.31394433975219727 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20234274864196777 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20248746871948242 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.3053321838378906 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.31397199630737305 seconds [default6]:Time to load utils op: 0.22909259796142578 seconds [default3]:Time to load utils op: 0.2025902271270752 seconds [default5]:Time to load utils op: 0.2291545867919922 seconds [default4]:Time to load utils op: 0.22983431816101074 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20233392715454102 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2024846076965332 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2024400234222412 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.20232248306274414 seconds [default3]:Time to load utils op: 0.20648789405822754 seconds [default4]:Time to load utils op: 0.24057698249816895 seconds [default5]:Time to load utils op: 0.2234818935394287 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30454039573669434 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004937648773193359 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.21631550788879395 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.21393680572509766 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.305727481842041 seconds [default1]:Loading extension module utils... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20258569717407227 seconds [default1]:Time to load utils op: 0.20261931419372559 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20255303382873535 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.21312451362609863 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30489230155944824 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3057687282562256 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20827627182006836 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3058192729949951 seconds [default7]:Time to load utils op: 0.222733736038208 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3065948486328125 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3058207035064697 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.21701979637145996 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.305588960647583 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3056364059448242 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2025458812713623 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2025306224822998 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30587005615234375 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2132585048675537 seconds [default4]:Time to load utils op: 0.21259593963623047 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3058788776397705 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3138546943664551 seconds [default5]:Time to load utils op: 0.24013328552246094 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30851101875305176 seconds [default4]:Time to load utils op: 0.2336728572845459 seconds [default7]:Time to load utils op: 0.23367547988891602 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.2023000717163086 seconds [default5]:Time to load utils op: 0.23354125022888184 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20592713356018066 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2025301456451416 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.21671247482299805 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20247721672058105 seconds [default4]:Time to load utils op: 0.2463364601135254 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20245766639709473 seconds [default7]:Time to load utils op: 0.2409963607788086 seconds [default6]:Time to load utils op: 0.23367547988891602 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3056604862213135 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2028815746307373 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.3052711486816406 seconds [default7]:Time to load utils op: 0.22896218299865723 seconds [default1]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3083930015563965 seconds [default1]:Time to load utils op: 0.3084862232208252 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.31369519233703613 seconds [default0]:Time to load utils op: 0.20873284339904785 seconds [default5]:Time to load utils op: 0.21169304847717285 seconds [default4]:Time to load utils op: 0.21910429000854492 seconds [default6]:Time to load utils op: 0.21144771575927734 seconds [default7]:Time to load utils op: 0.21155428886413574 seconds [default5]:Time to load utils op: 0.21841788291931152 seconds [default7]:Time to load utils op: 0.21770811080932617 seconds [default1]:Time to load utils op: 0.22469687461853027 seconds [default4]:Time to load utils op: 0.21604371070861816 seconds [default6]:Time to load utils op: 0.21817731857299805 seconds [default3]:Time to load utils op: 0.22466683387756348 seconds [default2]:Time to load utils op: 0.22301220893859863 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.31408214569091797 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.3136277198791504 seconds [default7]:Time to load utils op: 0.2147505283355713 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.31392765045166016 seconds [default6]:Time to load utils op: 0.21465086936950684 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.2058546543121338 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20589089393615723 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3059067726135254 seconds [default0]:Time to load utils op: 0.22310233116149902 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.3052675724029541 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20985007286071777 seconds [default5]:Time to load utils op: 0.21497488021850586 seconds [default2]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30522942543029785 seconds [default2]:Time to load utils op: 0.30487799644470215 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21704554557800293 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.2131667137145996 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20956134796142578 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.2025766372680664 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.2096846103668213 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3056521415710449 seconds [default3]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.3048593997955322 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.3047928810119629 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30487918853759766 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.30480360984802246 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.20254755020141602 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20256853103637695 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20258808135986328 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20886540412902832 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.2026834487915039 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20887279510498047 seconds [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.2088768482208252 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.20887064933776855 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30587077140808105 seconds [default3]:Time to load utils op: 0.3044860363006592 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006473064422607422 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.21004509925842285 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.20968866348266602 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.20967316627502441 seconds [default4]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.20995616912841797 seconds [default4]:Time to load utils op: 0.20974206924438477 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.20267295837402344 seconds [default0]:Time to load utils op: 0.20236682891845703 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.20252704620361328 seconds [default3]:Time to load utils op: 0.2024848461151123 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004630088806152344 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005817413330078125 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005900859832763672 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006840229034423828 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000537872314453125 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005266666412353516 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00047326087951660156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005633831024169922 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004916191101074219 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005807876586914062 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005559921264648438 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007381439208984375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004458427429199219 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00045752525329589844 seconds [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005893707275390625 seconds [default0]:Time to load utils op: 0.0007147789001464844 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00042319297790527344 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00045871734619140625 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00045490264892578125 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.000438690185546875 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004992485046386719 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0009629726409912109 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006704330444335938 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008697509765625 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00045871734619140625 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006837844848632812 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006887912750244141 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0009098052978515625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007166862487792969 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006299018859863281 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005624294281005859 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Loading extension module utils... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007176399230957031 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007352828979492188 seconds [default6]:Time to load utils op: 0.0008263587951660156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005943775177001953 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0011081695556640625 seconds [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004794597625732422 seconds [default5]:Time to load utils op: 0.0005977153778076172 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0010364055633544922 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005941390991210938 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007367134094238281 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Time to load utils op: 0.0008027553558349609 seconds [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.000682830810546875 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006959438323974609 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006213188171386719 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005636215209960938 seconds [default2]:Loading extension module utils... [default4]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004482269287109375 seconds [default2]:Time to load utils op: 0.0004775524139404297 seconds [default4]:Time to load utils op: 0.0005550384521484375 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006506443023681641 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007560253143310547 seconds [default2]:Time to load utils op: 0.0005445480346679688 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005524158477783203 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006818771362304688 seconds [default7]:Time to load utils op: 0.0005512237548828125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006325244903564453 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007569789886474609 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005862712860107422 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0009284019470214844 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Time to load utils op: 0.00044608116149902344 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005245208740234375 seconds [default7]:Time to load utils op: 0.0005450248718261719 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005736351013183594 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005693435668945312 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005550384521484375 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0003917217254638672 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008392333984375 seconds [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004355907440185547 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006318092346191406 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005004405975341797 seconds [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0007956027984619141 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0014693737030029297 seconds [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default0]:Loading extension module utils... [default1]:Time to load utils op: 0.0004057884216308594 seconds [default0]:Time to load utils op: 0.0004875659942626953 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0012493133544921875 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007078647613525391 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007851123809814453 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005071163177490234 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006253719329833984 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006017684936523438 seconds [default0]:Time to load utils op: 0.0004673004150390625 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005209445953369141 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006346702575683594 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005600452423095703 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004839897155761719 seconds [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005145072937011719 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005064010620117188 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004932880401611328 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0007605552673339844 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004379749298095703 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007855892181396484 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004467964172363281 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0008847713470458984 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Time to load utils op: 0.0010619163513183594 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005323886871337891 seconds [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006029605865478516 seconds [default0]:Time to load utils op: 0.0005822181701660156 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004303455352783203 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0007886886596679688 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004987716674804688 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005788803100585938 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0014562606811523438 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00037741661071777344 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Time to load utils op: 0.0010592937469482422 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004336833953857422 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004239082336425781 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00048232078552246094 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003941059112548828 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004589557647705078 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007078647613525391 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004296302795410156 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006246566772460938 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005078315734863281 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005819797515869141 seconds [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005402565002441406 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006322860717773438 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00060272216796875 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006015300750732422 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005645751953125 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007026195526123047 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0003459453582763672 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005223751068115234 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00038623809814453125 seconds [default0]:Time to load utils op: 0.0004448890686035156 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00048232078552246094 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00044274330139160156 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0010857582092285156 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006566047668457031 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004000663757324219 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0003726482391357422 seconds [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00048470497131347656 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Loading extension module utils... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0003876686096191406 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007658004760742188 seconds [default0]:Time to load utils op: 0.0004506111145019531 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0005183219909667969 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006830692291259766 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0011818408966064453 seconds [default2]:Time to load utils op: 0.0003993511199951172 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00039386749267578125 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00043892860412597656 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007390975952148438 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004355907440185547 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00046753883361816406 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004923343658447266 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006113052368164062 seconds [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0004947185516357422 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004200935363769531 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004868507385253906 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0003523826599121094 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0003819465637207031 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006396770477294922 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006139278411865234 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00046753883361816406 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005342960357666016 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004887580871582031 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0016944408416748047 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.001739501953125 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0014886856079101562 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004956722259521484 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0015401840209960938 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004906654357910156 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004680156707763672 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0003998279571533203 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004584789276123047 seconds [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0003800392150878906 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004687309265136719 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:Time to load utils op: 0.0005974769592285156 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004889965057373047 seconds [default6]:Time to load utils op: 0.00045371055603027344 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0012652873992919922 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default4]:Time to load utils op: 0.00046825408935546875 seconds [default7]:Time to load utils op: 0.0011119842529296875 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006208419799804688 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0007195472717285156 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00118255615234375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0007007122039794922 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default4]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00040602684020996094 seconds [default5]:Time to load utils op: 0.000606536865234375 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Time to load utils op: 0.0009129047393798828 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0003387928009033203 seconds [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0006775856018066406 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0011441707611083984 seconds [default6]:Time to load utils op: 0.0007214546203613281 seconds [default2]:Time to load utils op: 0.0005190372467041016 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006265640258789062 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0003705024719238281 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005123615264892578 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00035071372985839844 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008325576782226562 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004942417144775391 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004138946533203125 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008645057678222656 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004971027374267578 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.000400543212890625 seconds [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0003864765167236328 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004889965057373047 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005986690521240234 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006961822509765625 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005013942718505859 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0011279582977294922 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00043392181396484375 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007214546203613281 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005929470062255859 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008153915405273438 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005578994750976562 seconds [default5]:Time to load utils op: 0.0003864765167236328 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007605552673339844 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00033354759216308594 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006315708160400391 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.00048351287841796875 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004420280456542969 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0005369186401367188 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004780292510986328 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.000827789306640625 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006430149078369141 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0008571147918701172 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00041675567626953125 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0004398822784423828 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004208087921142578 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00041937828063964844 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00043892860412597656 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0003719329833984375 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008618831634521484 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.00045943260192871094 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005536079406738281 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0004527568817138672 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0006198883056640625 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.00036644935607910156 seconds [default4]:Time to load utils op: 0.0006811618804931641 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.00042939186096191406 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006079673767089844 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005774497985839844 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005421638488769531 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008077621459960938 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005295276641845703 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0007207393646240234 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.00043702125549316406 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.00045871734619140625 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004260540008544922 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0006709098815917969 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006785392761230469 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0006692409515380859 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005364418029785156 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005123615264892578 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005207061767578125 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0005853176116943359 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005424022674560547 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0006008148193359375 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0008189678192138672 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0009386539459228516 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0005972385406494141 seconds [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006034374237060547 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006315708160400391 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0004315376281738281 seconds [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004892349243164062 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0005061626434326172 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.0004253387451171875 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005152225494384766 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0006577968597412109 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0006334781646728516 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0006074905395507812 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0008001327514648438 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0004916191101074219 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.0008196830749511719 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.00051116943359375 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005078315734863281 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0007662773132324219 seconds [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.0005834102630615234 seconds [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0008292198181152344 seconds [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0008537769317626953 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-05 14:34:43,247] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [default0]:[2022-09-05 14:34:43,248] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer [default0]:[2022-09-05 14:34:43,248] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer [default0]:[2022-09-05 14:34:43,248] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = {basic_optimizer.__class__.__name__} [default0]:[2022-09-05 14:34:43,248] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:[2022-09-05 14:34:43,279] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer [default0]:[2022-09-05 14:34:43,280] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-05 14:34:43,280] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.61 GB, percent = 7.3% [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:Emitting ninja build file /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113/utils/build.ninja... [default4]:Building extension module utils... [default4]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [default4]:ninja: no work to do. [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.27287864685058594 seconds [default6]:Loading extension module utils... [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.30625486373901367 seconds [default6]:Time to load utils op: 0.3067328929901123 seconds [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.306549072265625 seconds [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.30813145637512207 seconds [default1]:Loading extension module utils... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.30864715576171875 seconds [default1]:Time to load utils op: 0.30808329582214355 seconds [default4]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default4]:No modifications detected for re-loaded extension module utils, skipping build step... [default4]:Loading extension module utils... [default4]:Time to load utils op: 0.0005245208740234375 seconds [default6]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default7]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:No modifications detected for re-loaded extension module utils, skipping build step... [default6]:Loading extension module utils... [default6]:Time to load utils op: 0.0004334449768066406 seconds [default7]:Loading extension module utils... [default7]:Time to load utils op: 0.0005042552947998047 seconds [default5]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default5]:No modifications detected for re-loaded extension module utils, skipping build step... [default5]:Loading extension module utils... [default5]:Time to load utils op: 0.00043511390686035156 seconds [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.30251097679138184 seconds [default0]:[2022-09-05 14:34:43,611] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 [default0]:[2022-09-05 14:34:43,612] [INFO] [utils.py:828:see_memory_usage] MA 6.7 GB Max_MA 6.7 GB CA 6.7 GB Max_CA 7 GB [default0]:[2022-09-05 14:34:43,612] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.61 GB, percent = 7.3% [default2]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default1]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:No modifications detected for re-loaded extension module utils, skipping build step... [default2]:Loading extension module utils... [default2]:Time to load utils op: 0.0023543834686279297 seconds [default1]:Loading extension module utils... [default1]:Time to load utils op: 0.002321481704711914 seconds [default3]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default3]:No modifications detected for re-loaded extension module utils, skipping build step... [default3]:Loading extension module utils... [default3]:Time to load utils op: 0.002380847930908203 seconds [default0]:[2022-09-05 14:34:43,674] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 [default0]:[2022-09-05 14:34:43,674] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:34:43,674] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.62 GB, percent = 7.3% [default0]:[2022-09-05 14:34:43,702] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 [default0]:[2022-09-05 14:34:43,702] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:34:43,703] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.62 GB, percent = 7.3% [default0]:[2022-09-05 14:34:43,730] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 [default0]:[2022-09-05 14:34:43,730] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:34:43,730] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.62 GB, percent = 7.3% [default0]:[2022-09-05 14:34:43,757] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer [default0]:[2022-09-05 14:34:43,757] [INFO] [utils.py:828:see_memory_usage] MA 23.45 GB Max_MA 23.45 GB CA 25.12 GB Max_CA 25 GB [default0]:[2022-09-05 14:34:43,757] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.62 GB, percent = 7.3% [default0]:[2022-09-05 14:34:43,840] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer [default0]:[2022-09-05 14:34:43,840] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-05 14:34:43,840] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.62 GB, percent = 7.3% [default0]:[2022-09-05 14:34:43,868] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer [default0]:[2022-09-05 14:34:43,868] [INFO] [utils.py:828:see_memory_usage] MA 30.15 GB Max_MA 30.15 GB CA 31.82 GB Max_CA 32 GB [default0]:[2022-09-05 14:34:43,868] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 36.62 GB, percent = 7.3% [default0]:[2022-09-05 14:34:43,868] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [default0]:[2022-09-05 14:34:43,869] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler [default0]:[2022-09-05 14:34:43,869] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [default0]:[2022-09-05 14:34:43,869] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05, 2e-05], mom=[(0.9, 0.95), (0.9, 0.95)] [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:987:print] DeepSpeedEngine configuration: [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] activation_checkpointing_config { [default0]: "partition_activations": false, [default0]: "contiguous_memory_optimization": false, [default0]: "cpu_checkpointing": false, [default0]: "number_checkpoints": null, [default0]: "synchronize_checkpoint_boundary": false, [default0]: "profile": false [default0]:} [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] amp_enabled .................. False [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] amp_params ................... False [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] autotuning_config ............ { [default0]: "enabled": false, [default0]: "start_step": null, [default0]: "end_step": null, [default0]: "metric_path": null, [default0]: "arg_mappings": null, [default0]: "metric": "throughput", [default0]: "model_info": null, [default0]: "results_dir": null, [default0]: "exps_dir": null, [default0]: "overwrite": true, [default0]: "fast": true, [default0]: "start_profile_step": 3, [default0]: "end_profile_step": 5, [default0]: "tuner_type": "gridsearch", [default0]: "tuner_early_stopping": 5, [default0]: "tuner_num_trials": 50, [default0]: "model_info_path": null, [default0]: "mp_size": 1, [default0]: "max_train_batch_size": null, [default0]: "min_train_batch_size": 1, [default0]: "max_train_micro_batch_size_per_gpu": 1.024000e+03, [default0]: "min_train_micro_batch_size_per_gpu": 1, [default0]: "num_tuning_micro_batch_sizes": 3 [default0]:} [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] bfloat16_enabled ............. True [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] checkpoint_tag_validation_enabled True [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] checkpoint_tag_validation_fail False [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] comms_config ................. [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] communication_data_type ...... None [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] curriculum_enabled ........... False [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] curriculum_params ............ False [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] dataloader_drop_last ......... False [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] disable_allgather ............ False [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] dump_state ................... False [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] dynamic_loss_scale_args ...... None [default0]:[2022-09-05 14:34:43,869] [INFO] [config.py:991:print] eigenvalue_enabled ........... False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] eigenvalue_gas_boundary_resolution 1 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] eigenvalue_layer_name ........ bert.encoder.layer [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] eigenvalue_layer_num ......... 0 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] eigenvalue_max_iter .......... 100 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] eigenvalue_stability ......... 1e-06 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] eigenvalue_tol ............... 0.01 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] eigenvalue_verbose ........... False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] elasticity_enabled ........... False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] flops_profiler_config ........ { [default0]: "enabled": false, [default0]: "profile_step": 1, [default0]: "module_depth": -1, [default0]: "top_modules": 1, [default0]: "detailed": true, [default0]: "output_file": null [default0]:} [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] fp16_auto_cast ............... None [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] fp16_enabled ................. False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] fp16_master_weights_and_gradients False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] global_rank .................. 0 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] gradient_accumulation_steps .. 512 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] gradient_clipping ............ 1.0 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] gradient_predivide_factor .... 1.0 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] initial_dynamic_scale ........ 1 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] load_universal_checkpoint .... True [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] loss_scale ................... 1.0 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] memory_breakdown ............. False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] monitor_config ............... [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] nebula_config ................ { [default0]: "enabled": false, [default0]: "persistent_storage_path": null, [default0]: "persistent_time_interval": 100, [default0]: "num_of_version_in_retention": 2, [default0]: "enable_nebula_load": true, [default0]: "load_path": null [default0]:} [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] optimizer_legacy_fusion ...... False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] optimizer_name ............... None [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] optimizer_params ............. None [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] pld_enabled .................. False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] pld_params ................... False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] prescale_gradients ........... False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] scheduler_name ............... None [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] scheduler_params ............. None [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] sparse_attention ............. None [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] sparse_gradients_enabled ..... False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] steps_per_print .............. 2000 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] train_batch_size ............. 2048 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] train_micro_batch_size_per_gpu 1 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] wall_clock_breakdown ......... False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] world_size ................... 4 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] zero_allow_untested_optimizer False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] zero_enabled ................. False [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:991:print] zero_optimization_stage ...... 0 [default0]:[2022-09-05 14:34:43,870] [INFO] [config.py:976:print_user_config] json = { [default0]: "train_micro_batch_size_per_gpu": 1, [default0]: "train_batch_size": 2.048000e+03, [default0]: "gradient_clipping": 1.0, [default0]: "zero_optimization": { [default0]: "stage": 0 [default0]: }, [default0]: "bf16": { [default0]: "enabled": true [default0]: }, [default0]: "steps_per_print": 2.000000e+03, [default0]: "wall_clock_breakdown": false, [default0]: "checkpoint": { [default0]: "load_universal": true [default0]: } [default0]:} [default0]:Using /gpfs7kw/linkhome/rech/genhug01/unj46ad/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... [default0]:No modifications detected for re-loaded extension module utils, skipping build step... [default0]:Loading extension module utils... [default0]:Time to load utils op: 0.0004627704620361328 seconds [default0]:[2022-09-05 14:34:43,871] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=512 micro_batch_size=1 [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=12 STAGE=3 LAYERS=1 [5, 6) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=16 STAGE=4 LAYERS=1 [6, 7) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=84 STAGE=21 LAYERS=1 [23, 24) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=184 STAGE=46 LAYERS=1 [48, 49) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=32 STAGE=8 LAYERS=1 [10, 11) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=36 STAGE=9 LAYERS=1 [11, 12) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=280 STAGE=70 LAYERS=3 [72, 75) STAGE_PARAMS=2466465792 (2466.466M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=56 STAGE=14 LAYERS=1 [16, 17) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=208 STAGE=52 LAYERS=1 [54, 55) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=212 STAGE=53 LAYERS=1 [55, 56) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=152 STAGE=38 LAYERS=1 [40, 41) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=132 STAGE=33 LAYERS=1 [35, 36) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=232 STAGE=58 LAYERS=1 [60, 61) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=236 STAGE=59 LAYERS=1 [61, 62) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=128 STAGE=32 LAYERS=1 [34, 35) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=188 STAGE=47 LAYERS=1 [49, 50) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=284 STAGE=71 LAYERS=2 [75, 77) STAGE_PARAMS=3596615680 (3596.616M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=176 STAGE=44 LAYERS=1 [46, 47) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,447] [INFO] [engine.py:145:__init__] RANK=156 STAGE=39 LAYERS=1 [41, 42) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=20 STAGE=5 LAYERS=1 [7, 8) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=60 STAGE=15 LAYERS=1 [17, 18) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=164 STAGE=41 LAYERS=1 [43, 44) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=96 STAGE=24 LAYERS=1 [26, 27) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=272 STAGE=68 LAYERS=1 [70, 71) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=40 STAGE=10 LAYERS=1 [12, 13) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=136 STAGE=34 LAYERS=1 [36, 37) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=264 STAGE=66 LAYERS=1 [68, 69) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=140 STAGE=35 LAYERS=1 [37, 38) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,447] [INFO] [engine.py:145:__init__] RANK=104 STAGE=26 LAYERS=1 [28, 29) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=180 STAGE=45 LAYERS=1 [47, 48) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=100 STAGE=25 LAYERS=1 [27, 28) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=204 STAGE=51 LAYERS=1 [53, 54) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=48 STAGE=12 LAYERS=1 [14, 15) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=276 STAGE=69 LAYERS=1 [71, 72) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,447] [INFO] [engine.py:145:__init__] RANK=228 STAGE=57 LAYERS=1 [59, 60) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=64 STAGE=16 LAYERS=1 [18, 19) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,447] [INFO] [engine.py:145:__init__] RANK=108 STAGE=27 LAYERS=1 [29, 30) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,447] [INFO] [engine.py:145:__init__] RANK=148 STAGE=37 LAYERS=1 [39, 40) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=144 STAGE=36 LAYERS=1 [38, 39) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=224 STAGE=56 LAYERS=1 [58, 59) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=248 STAGE=62 LAYERS=1 [64, 65) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=44 STAGE=11 LAYERS=1 [13, 14) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=160 STAGE=40 LAYERS=1 [42, 43) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=252 STAGE=63 LAYERS=1 [65, 66) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,447] [INFO] [engine.py:145:__init__] RANK=220 STAGE=55 LAYERS=1 [57, 58) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=256 STAGE=64 LAYERS=1 [66, 67) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=216 STAGE=54 LAYERS=1 [56, 57) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=260 STAGE=65 LAYERS=1 [67, 68) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=52 STAGE=13 LAYERS=1 [15, 16) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=68 STAGE=17 LAYERS=1 [19, 20) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=268 STAGE=67 LAYERS=1 [69, 70) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,447] [INFO] [engine.py:145:__init__] RANK=120 STAGE=30 LAYERS=1 [32, 33) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,447] [INFO] [engine.py:145:__init__] RANK=124 STAGE=31 LAYERS=1 [33, 34) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=72 STAGE=18 LAYERS=1 [20, 21) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,447] [INFO] [engine.py:145:__init__] RANK=76 STAGE=19 LAYERS=1 [21, 22) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=4 STAGE=1 LAYERS=1 [3, 4) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=3 [0, 3) STAGE_PARAMS=3596644352 (3596.644M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=240 STAGE=60 LAYERS=1 [62, 63) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=192 STAGE=48 LAYERS=1 [50, 51) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=80 STAGE=20 LAYERS=1 [22, 23) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,453] [INFO] [engine.py:145:__init__] RANK=244 STAGE=61 LAYERS=1 [63, 64) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,447] [INFO] [engine.py:145:__init__] RANK=112 STAGE=28 LAYERS=1 [30, 31) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=28 STAGE=7 LAYERS=1 [9, 10) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=24 STAGE=6 LAYERS=1 [8, 9) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=8 STAGE=2 LAYERS=1 [4, 5) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=200 STAGE=50 LAYERS=1 [52, 53) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=88 STAGE=22 LAYERS=1 [24, 25) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=92 STAGE=23 LAYERS=1 [25, 26) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=196 STAGE=49 LAYERS=1 [51, 52) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=168 STAGE=42 LAYERS=1 [44, 45) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,446] [INFO] [engine.py:145:__init__] RANK=172 STAGE=43 LAYERS=1 [45, 46) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default4]:[2022-09-05 14:34:44,447] [INFO] [engine.py:145:__init__] RANK=116 STAGE=29 LAYERS=1 [31, 32) STAGE_PARAMS=2466437120 (2466.437M) TOTAL_PARAMS=179843887104 (179843.887M) UNIQUE_PARAMS=176247271424 (176247.271M) [default0]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default1]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default2]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default1]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default2]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default4]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default6]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default6]:[2022-09-05 14:34:45,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default3]:[2022-09-05 14:34:45,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default0]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default0]:[2022-09-05 14:34:45,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default5]:[2022-09-05 14:34:45,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default4]:[2022-09-05 14:34:45,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default7]:[2022-09-05 14:34:45,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt... [default7]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default5]:[2022-09-05 14:34:45,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal/mp_rank_00_model_states.pt. [default3]:[2022-09-05 14:34:53,590] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 155 [default3]:[2022-09-05 14:34:55,904] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 99 [default3]:[2022-09-05 14:34:56,192] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 123 [default7]:[2022-09-05 14:34:56,350] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 103 [default3]:[2022-09-05 14:34:56,474] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 251 [default2]:[2022-09-05 14:34:56,821] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 154 [default3]:[2022-09-05 14:34:57,437] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 195 [default2]:[2022-09-05 14:34:57,639] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 226 [default3]:[2022-09-05 14:34:57,625] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 227 [default7]:[2022-09-05 14:34:57,731] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 231 [default6]:[2022-09-05 14:34:57,841] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 158 [default7]:[2022-09-05 14:34:57,840] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 159 [default3]:[2022-09-05 14:34:57,945] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 243 [default3]:[2022-09-05 14:34:57,874] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 235 [default6]:[2022-09-05 14:34:58,053] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 230 [default7]:[2022-09-05 14:34:58,035] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 271 [default6]:[2022-09-05 14:34:58,032] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 270 [default2]:[2022-09-05 14:34:58,082] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 50 [default3]:[2022-09-05 14:34:58,115] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 51 [default3]:[2022-09-05 14:34:58,202] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 203 [default4]:[2022-09-05 14:34:58,328] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 100 [default5]:[2022-09-05 14:34:58,326] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 101 [default7]:[2022-09-05 14:34:58,581] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 79 [default3]:[2022-09-05 14:34:58,586] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 67 [default2]:[2022-09-05 14:34:58,734] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 106 [default3]:[2022-09-05 14:34:58,748] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 107 [default7]:[2022-09-05 14:34:58,777] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 247 [default1]:[2022-09-05 14:34:58,802] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 153 [default7]:[2022-09-05 14:34:58,778] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 127 [default3]:[2022-09-05 14:34:58,864] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 131 [default6]:[2022-09-05 14:34:58,824] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 78 [default5]:[2022-09-05 14:34:59,000] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 157 [default4]:[2022-09-05 14:34:59,009] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 156 [default7]:[2022-09-05 14:34:59,090] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 223 [default6]:[2022-09-05 14:34:59,095] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 222 [default7]:[2022-09-05 14:34:59,190] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 199 [default0]:[2022-09-05 14:34:59,244] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 96 [default1]:[2022-09-05 14:34:59,245] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 97 [default7]:[2022-09-05 14:34:59,327] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 135 [default6]:[2022-09-05 14:34:59,280] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 278 [default7]:[2022-09-05 14:34:59,279] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 279 [default7]:[2022-09-05 14:34:59,442] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 183 [default7]:[2022-09-05 14:34:59,457] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 47 [default6]:[2022-09-05 14:34:59,458] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 262 [default6]:[2022-09-05 14:34:59,443] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 182 [default7]:[2022-09-05 14:34:59,457] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 263 [default6]:[2022-09-05 14:34:59,542] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 198 [default2]:[2022-09-05 14:34:59,477] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 162 [default3]:[2022-09-05 14:34:59,485] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 163 [default6]:[2022-09-05 14:34:59,570] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 166 [default7]:[2022-09-05 14:34:59,565] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 31 [default7]:[2022-09-05 14:34:59,574] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 167 [default3]:[2022-09-05 14:34:59,631] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 147 [default3]:[2022-09-05 14:34:59,690] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 283 [default6]:[2022-09-05 14:34:59,759] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 238 [default7]:[2022-09-05 14:34:59,679] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 207 [default3]:[2022-09-05 14:34:59,756] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 219 [default0]:[2022-09-05 14:34:59,818] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 152 [default3]:[2022-09-05 14:34:59,815] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 275 [default0]:[2022-09-05 14:34:59,861] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 224 [default7]:[2022-09-05 14:34:59,963] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 175 [default6]:[2022-09-05 14:34:59,961] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 174 [default2]:[2022-09-05 14:34:59,934] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 18 [default3]:[2022-09-05 14:34:59,901] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 83 [default2]:[2022-09-05 14:34:59,910] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 282 [default3]:[2022-09-05 14:34:59,937] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 19 [default2]:[2022-09-05 14:34:59,908] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 178 [default3]:[2022-09-05 14:34:59,896] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 179 [default7]:[2022-09-05 14:34:59,949] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 239 [default2]:[2022-09-05 14:34:59,934] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 98 [default1]:[2022-09-05 14:34:59,898] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 225 [default2]:[2022-09-05 14:34:59,939] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 258 [default6]:[2022-09-05 14:35:00,049] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 94 [default7]:[2022-09-05 14:35:00,053] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 95 [default2]:[2022-09-05 14:35:00,002] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 274 [default3]:[2022-09-05 14:35:00,001] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 259 [default6]:[2022-09-05 14:35:00,053] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 102 [default3]:[2022-09-05 14:35:00,148] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 11 [default2]:[2022-09-05 14:35:00,134] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 10 [default2]:[2022-09-05 14:35:00,105] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 218 [default6]:[2022-09-05 14:35:00,175] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 126 [default3]:[2022-09-05 14:35:00,226] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 171 [default4]:[2022-09-05 14:35:00,268] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 124 [default5]:[2022-09-05 14:35:00,268] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 125 [default7]:[2022-09-05 14:35:00,330] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 15 [default6]:[2022-09-05 14:35:00,329] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 14 [default7]:[2022-09-05 14:35:00,343] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 119 [default3]:[2022-09-05 14:35:00,465] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 211 [default7]:[2022-09-05 14:35:00,449] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 23 [default6]:[2022-09-05 14:35:00,487] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 22 [default6]:[2022-09-05 14:35:00,541] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 118 [default3]:[2022-09-05 14:35:00,606] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 91 [default5]:[2022-09-05 14:35:00,634] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 117 [default2]:[2022-09-05 14:35:00,615] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 90 [default7]:[2022-09-05 14:35:00,621] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 215 [default6]:[2022-09-05 14:35:00,628] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 214 [default4]:[2022-09-05 14:35:00,631] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 116 [default2]:[2022-09-05 14:35:00,653] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 42 [default4]:[2022-09-05 14:35:00,575] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 228 [default7]:[2022-09-05 14:35:00,637] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 143 [default6]:[2022-09-05 14:35:00,634] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 142 [default5]:[2022-09-05 14:35:00,577] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 229 [default4]:[2022-09-05 14:35:00,632] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 220 [default5]:[2022-09-05 14:35:00,628] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 221 [default7]:[2022-09-05 14:35:00,717] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 39 [default2]:[2022-09-05 14:35:00,701] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 210 [default3]:[2022-09-05 14:35:00,718] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 43 [default2]:[2022-09-05 14:35:00,798] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 26 [default3]:[2022-09-05 14:35:00,838] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 27 [default4]:[2022-09-05 14:35:00,838] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 212 [default5]:[2022-09-05 14:35:00,839] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 213 [default6]:[2022-09-05 14:35:00,806] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 46 [default2]:[2022-09-05 14:35:00,813] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 66 [default0]:[2022-09-05 14:35:00,929] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 8 [default1]:[2022-09-05 14:35:00,864] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 9 [default5]:[2022-09-05 14:35:00,967] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 21 [default4]:[2022-09-05 14:35:00,964] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 20 [default4]:[2022-09-05 14:35:00,961] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 180 [default5]:[2022-09-05 14:35:00,966] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 181 [default7]:[2022-09-05 14:35:00,884] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 87 [default3]:[2022-09-05 14:35:00,911] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 139 [default2]:[2022-09-05 14:35:00,896] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 122 [default7]:[2022-09-05 14:35:00,936] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 55 [default5]:[2022-09-05 14:35:00,913] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 261 [default4]:[2022-09-05 14:35:00,949] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 260 [default3]:[2022-09-05 14:35:00,995] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 115 [default6]:[2022-09-05 14:35:00,985] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 86 [default6]:[2022-09-05 14:35:00,981] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 62 [default7]:[2022-09-05 14:35:00,975] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 63 [default2]:[2022-09-05 14:35:00,992] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 114 [default7]:[2022-09-05 14:35:00,991] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 151 [default1]:[2022-09-05 14:35:01,079] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 249 [default6]:[2022-09-05 14:35:01,079] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 54 [default7]:[2022-09-05 14:35:01,081] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 71 [default1]:[2022-09-05 14:35:01,028] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 217 [default0]:[2022-09-05 14:35:01,025] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 216 [default2]:[2022-09-05 14:35:01,111] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 194 [default0]:[2022-09-05 14:35:01,092] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 128 [default1]:[2022-09-05 14:35:01,098] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 129 [default6]:[2022-09-05 14:35:01,080] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 70 [default0]:[2022-09-05 14:35:01,116] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 248 [default1]:[2022-09-05 14:35:01,095] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 161 [default5]:[2022-09-05 14:35:01,179] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 253 [default0]:[2022-09-05 14:35:01,097] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 160 [default0]:[2022-09-05 14:35:01,241] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 192 [default1]:[2022-09-05 14:35:01,186] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 89 [default0]:[2022-09-05 14:35:01,185] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 88 [default2]:[2022-09-05 14:35:01,244] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 58 [default1]:[2022-09-05 14:35:01,219] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 177 [default1]:[2022-09-05 14:35:01,179] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 209 [default0]:[2022-09-05 14:35:01,177] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 208 [default2]:[2022-09-05 14:35:01,259] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 130 [default0]:[2022-09-05 14:35:01,244] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 272 [default1]:[2022-09-05 14:35:01,238] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 273 [default4]:[2022-09-05 14:35:01,198] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 108 [default4]:[2022-09-05 14:35:01,207] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 276 [default2]:[2022-09-05 14:35:01,219] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 202 [default5]:[2022-09-05 14:35:01,215] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 277 [default6]:[2022-09-05 14:35:01,175] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 206 [default1]:[2022-09-05 14:35:01,260] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 121 [default4]:[2022-09-05 14:35:01,186] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 252 [default0]:[2022-09-05 14:35:01,241] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 120 [default5]:[2022-09-05 14:35:01,207] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 109 [default0]:[2022-09-05 14:35:01,356] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 200 [default4]:[2022-09-05 14:35:01,299] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 244 [default5]:[2022-09-05 14:35:01,310] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 29 [default5]:[2022-09-05 14:35:01,298] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 245 [default6]:[2022-09-05 14:35:01,344] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 190 [default4]:[2022-09-05 14:35:01,347] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 236 [default6]:[2022-09-05 14:35:01,310] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 110 [default7]:[2022-09-05 14:35:01,312] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 111 [default4]:[2022-09-05 14:35:01,307] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 164 [default5]:[2022-09-05 14:35:01,311] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 165 [default1]:[2022-09-05 14:35:01,351] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 201 [default6]:[2022-09-05 14:35:01,341] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 254 [default7]:[2022-09-05 14:35:01,334] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 255 [default6]:[2022-09-05 14:35:01,357] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 6 [default7]:[2022-09-05 14:35:01,384] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 7 [default5]:[2022-09-05 14:35:01,389] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 13 [default4]:[2022-09-05 14:35:01,399] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 12 [default0]:[2022-09-05 14:35:01,385] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 176 [default0]:[2022-09-05 14:35:01,373] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 104 [default5]:[2022-09-05 14:35:01,464] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 133 [default1]:[2022-09-05 14:35:01,375] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 105 [default6]:[2022-09-05 14:35:01,462] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 134 [default2]:[2022-09-05 14:35:01,433] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 250 [default4]:[2022-09-05 14:35:01,465] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 132 [default0]:[2022-09-05 14:35:01,390] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 144 [default1]:[2022-09-05 14:35:01,393] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 145 [default4]:[2022-09-05 14:35:01,470] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 68 [default2]:[2022-09-05 14:35:01,520] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 242 [default6]:[2022-09-05 14:35:01,560] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 30 [default5]:[2022-09-05 14:35:01,568] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 61 [default2]:[2022-09-05 14:35:01,531] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 266 [default4]:[2022-09-05 14:35:01,520] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 60 [default4]:[2022-09-05 14:35:01,499] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 204 [default5]:[2022-09-05 14:35:01,496] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 205 [default4]:[2022-09-05 14:35:01,525] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 148 [default3]:[2022-09-05 14:35:01,534] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 267 [default0]:[2022-09-05 14:35:01,527] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 64 [default6]:[2022-09-05 14:35:01,509] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 150 [default3]:[2022-09-05 14:35:01,535] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 75 [default1]:[2022-09-05 14:35:01,611] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 193 [default4]:[2022-09-05 14:35:01,627] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 196 [default1]:[2022-09-05 14:35:01,576] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 241 [default5]:[2022-09-05 14:35:01,607] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 173 [default4]:[2022-09-05 14:35:01,605] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 172 [default5]:[2022-09-05 14:35:01,628] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 197 [default3]:[2022-09-05 14:35:01,597] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 35 [default1]:[2022-09-05 14:35:01,582] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 113 [default0]:[2022-09-05 14:35:01,581] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 112 [default2]:[2022-09-05 14:35:01,596] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 34 [default6]:[2022-09-05 14:35:01,636] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 246 [default0]:[2022-09-05 14:35:01,618] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 40 [default5]:[2022-09-05 14:35:01,588] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 237 [default2]:[2022-09-05 14:35:01,616] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 146 [default1]:[2022-09-05 14:35:01,643] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 257 [default0]:[2022-09-05 14:35:01,641] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 256 [default4]:[2022-09-05 14:35:01,643] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 76 [default5]:[2022-09-05 14:35:01,680] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 93 [default0]:[2022-09-05 14:35:01,701] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 240 [default4]:[2022-09-05 14:35:01,681] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 92 [default0]:[2022-09-05 14:35:01,729] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 80 [default2]:[2022-09-05 14:35:01,761] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 186 [default3]:[2022-09-05 14:35:01,680] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 59 [default0]:[2022-09-05 14:35:01,735] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 264 [default1]:[2022-09-05 14:35:01,702] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 265 [default5]:[2022-09-05 14:35:01,701] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 69 [default1]:[2022-09-05 14:35:01,705] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 49 [default1]:[2022-09-05 14:35:01,783] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 65 [default0]:[2022-09-05 14:35:01,837] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 168 [default1]:[2022-09-05 14:35:01,795] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 81 [default0]:[2022-09-05 14:35:01,787] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 32 [default1]:[2022-09-05 14:35:01,789] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 33 [default0]:[2022-09-05 14:35:01,797] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 16 [default2]:[2022-09-05 14:35:01,852] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 82 [default0]:[2022-09-05 14:35:01,783] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 232 [default3]:[2022-09-05 14:35:01,824] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 187 [default2]:[2022-09-05 14:35:01,825] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 234 [default1]:[2022-09-05 14:35:01,801] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 17 [default5]:[2022-09-05 14:35:01,865] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 149 [default5]:[2022-09-05 14:35:01,874] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 77 [default1]:[2022-09-05 14:35:01,872] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 25 [default4]:[2022-09-05 14:35:01,890] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 28 [default0]:[2022-09-05 14:35:01,872] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 24 [default0]:[2022-09-05 14:35:01,925] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 280 [default0]:[2022-09-05 14:35:01,939] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 48 [default1]:[2022-09-05 14:35:01,889] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 137 [default5]:[2022-09-05 14:35:01,895] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 45 [default4]:[2022-09-05 14:35:01,891] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 44 [default4]:[2022-09-05 14:35:01,938] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 268 [default5]:[2022-09-05 14:35:01,936] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 269 [default4]:[2022-09-05 14:35:01,912] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 4 [default2]:[2022-09-05 14:35:02,005] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 170 [default4]:[2022-09-05 14:35:02,051] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 84 [default7]:[2022-09-05 14:35:02,001] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 191 [default1]:[2022-09-05 14:35:02,041] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 41 [default5]:[2022-09-05 14:35:02,068] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 141 [default5]:[2022-09-05 14:35:02,047] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 85 [default5]:[2022-09-05 14:35:02,038] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 5 [default0]:[2022-09-05 14:35:02,135] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 184 [default0]:[2022-09-05 14:35:02,094] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 136 [default5]:[2022-09-05 14:35:02,144] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 53 [default2]:[2022-09-05 14:35:02,147] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 138 [default0]:[2022-09-05 14:35:02,095] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 72 [default1]:[2022-09-05 14:35:02,095] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 73 [default2]:[2022-09-05 14:35:02,161] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 74 [default4]:[2022-09-05 14:35:02,147] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 52 [default6]:[2022-09-05 14:35:02,209] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 38 [default4]:[2022-09-05 14:35:02,262] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 188 [default4]:[2022-09-05 14:35:02,331] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 36 [default1]:[2022-09-05 14:35:02,341] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 233 [default0]:[2022-09-05 14:35:02,416] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 56 [default1]:[2022-09-05 14:35:02,419] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 57 [default1]:[2022-09-05 14:35:02,402] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 169 [default1]:[2022-09-05 14:35:02,520] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 185 [default4]:[2022-09-05 14:35:02,748] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 140 [default1]:[2022-09-05 14:35:02,728] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 281 [default5]:[2022-09-05 14:35:02,804] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 189 [default5]:[2022-09-05 14:35:02,907] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 37 [default7]:[2022-09-05 14:35:06,229] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 287 [default0]:[2022-09-05 14:35:06,531] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 0 [default0]:could not find arguments in the checkpoint ... [default0]: checkpoint version 3.0 [default5]:[2022-09-05 14:35:07,926] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 285 [default1]:[2022-09-05 14:35:08,894] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 1 [default4]:[2022-09-05 14:35:09,265] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 284 [default6]:[2022-09-05 14:35:09,757] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 286 [default2]:[2022-09-05 14:35:09,934] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 2 [default7]:time (ms) | load-checkpoint: 27267.61 [default0]: successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq at iteration 0 [default3]:[2022-09-05 14:35:12,575] [INFO] [engine.py:2763:_load_zero_checkpoint] loaded universal zero checkpoints from /gpfsscratch/rech/six/commun/checkpoints/tr13-176B-ml-t0/checkpoints/p31lossseq/global_step0_universal for rank 3 [default0]:estimated model parameters: 258.958393344 [default0]:estimated model parameters without embeddings: 0.002064384 [default0]:[after model, optimizer, and learning rate scheduler are built] datetime: 2022-09-05 14:35:12 [default0]:> building train, validation, and test datasets ... [default0]: > datasets target sizes (minimum size): [default0]: train: 6348800 [default0]: validation: 266240 [default0]: test: 20480 [default0]:> building train, validation, and test datasets for T0 ... [default0]: > building dataset index ... [default0]:/gpfsssd/worksf/projects/rech/six/commun/code/tr13f-6B3-ml-t0/megdslossseqnew/Megatron-DeepSpeed/megatron/utils.py:365: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings [default0]: warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.057661 seconds [default0]: number of documents: 90897616 [default0]: > dataset split: [default0]: train: [default0]: document indices in [0, 90897616) total of 90897616 documents [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.033940 seconds [default0]: number of documents: 90897616 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.004158 seconds [default0]: number of documents: 90897616 [default0]: > loading doc-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_batch_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_train_train_indexmap_6348800ns_42s_decoder_packed_shuffle_idx.npy [default0]: loaded indexed file in 0.063 seconds [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.051908 seconds [default0]: number of documents: 15234080 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [14472376, 15234080) total of 761704 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_8848ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_8848ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ar/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ar_text_document_validation_pretraining_indexmap_8848ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.035 seconds [default0]: total number of samples: 221750 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.022337 seconds [default0]: number of documents: 6142390 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [5835270, 6142390) total of 307120 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_3009ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_3009ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/ca/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_ca_text_document_validation_pretraining_indexmap_3009ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.155 seconds [default0]: total number of samples: 136143 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.061407 seconds [default0]: number of documents: 26176998 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [24868148, 26176998) total of 1308850 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_34858ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_34858ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/code/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_code_text_document_validation_pretraining_indexmap_34858ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.018 seconds [default0]: total number of samples: 432311 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.345456 seconds [default0]: number of documents: 20844665 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [19802432, 20844665) total of 1042233 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_59324ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_59324ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/en/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_en_text_document_validation_pretraining_indexmap_59324ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.058 seconds [default0]: total number of samples: 521545 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.417175 seconds [default0]: number of documents: 67005817 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [63655526, 67005817) total of 3350291 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_28545ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_28545ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/es/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_es_text_document_validation_pretraining_indexmap_28545ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.081 seconds [default0]: total number of samples: 1740321 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.013954 seconds [default0]: number of documents: 5149795 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4892305, 5149795) total of 257490 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_418ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_418ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/eu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_eu_text_document_validation_pretraining_indexmap_418ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.052 seconds [default0]: total number of samples: 26370 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.226657 seconds [default0]: number of documents: 58847091 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [55904736, 58847091) total of 2942355 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_34929ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_34929ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/fr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_fr_text_document_validation_pretraining_indexmap_34929ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.044 seconds [default0]: total number of samples: 1458654 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.034893 seconds [default0]: number of documents: 12514253 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11888540, 12514253) total of 625713 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_2922ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_2922ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/id/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_id_text_document_validation_pretraining_indexmap_2922ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.048 seconds [default0]: total number of samples: 134071 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.034380 seconds [default0]: number of documents: 180608 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [171578, 180608) total of 9030 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_30ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_30ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-as/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-as_text_document_validation_pretraining_indexmap_30ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.020 seconds [default0]: total number of samples: 2501 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.187498 seconds [default0]: number of documents: 12303134 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [11687977, 12303134) total of 615157 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_1470ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_1470ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-bn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-bn_text_document_validation_pretraining_indexmap_1470ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.066 seconds [default0]: total number of samples: 157244 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.220693 seconds [default0]: number of documents: 2033057 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1931404, 2033057) total of 101653 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_108ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_108ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-gu/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-gu_text_document_validation_pretraining_indexmap_108ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.067 seconds [default0]: total number of samples: 20517 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.132378 seconds [default0]: number of documents: 26793553 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [25453875, 26793553) total of 1339678 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1999ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1999ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-hi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-hi_text_document_validation_pretraining_indexmap_1999ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.059 seconds [default0]: total number of samples: 101502 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.153274 seconds [default0]: number of documents: 3155990 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2998190, 3155990) total of 157800 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_166ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_166ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-kn/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-kn_text_document_validation_pretraining_indexmap_166ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.026 seconds [default0]: total number of samples: 44182 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.094798 seconds [default0]: number of documents: 6692522 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [6357896, 6692522) total of 334626 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_277ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_277ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ml/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ml_text_document_validation_pretraining_indexmap_277ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.040 seconds [default0]: total number of samples: 47613 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.110992 seconds [default0]: number of documents: 3017261 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2866398, 3017261) total of 150863 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_135ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_135ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-mr/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-mr_text_document_validation_pretraining_indexmap_135ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.009 seconds [default0]: total number of samples: 29298 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.097743 seconds [default0]: number of documents: 3648041 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [3465639, 3648041) total of 182402 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_179ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_179ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ne/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ne_text_document_validation_pretraining_indexmap_179ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.030 seconds [default0]: total number of samples: 5659 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.082558 seconds [default0]: number of documents: 4327282 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4110918, 4327282) total of 216364 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_97ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_97ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-or/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-or_text_document_validation_pretraining_indexmap_97ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.006 seconds [default0]: total number of samples: 12423 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.177536 seconds [default0]: number of documents: 2698896 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2563951, 2698896) total of 134945 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_137ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_137ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-pa/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-pa_text_document_validation_pretraining_indexmap_137ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.013 seconds [default0]: total number of samples: 19133 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.406925 seconds [default0]: number of documents: 12767593 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [12129213, 12767593) total of 638380 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_566ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_566ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ta/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ta_text_document_validation_pretraining_indexmap_566ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.040 seconds [default0]: total number of samples: 87928 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.067767 seconds [default0]: number of documents: 4342323 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [4125207, 4342323) total of 217116 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_245ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_245ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-te/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-te_text_document_validation_pretraining_indexmap_245ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.034 seconds [default0]: total number of samples: 69780 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.111759 seconds [default0]: number of documents: 3022722 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [2871586, 3022722) total of 151136 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_334ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_334ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/indic-ur/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_indic-ur_text_document_validation_pretraining_indexmap_334ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.018 seconds [default0]: total number of samples: 22532 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.088227 seconds [default0]: number of documents: 1162568 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [1104440, 1162568) total of 58128 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_85ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_85ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/nigercongo-all/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_nigercongo-all_text_document_validation_pretraining_indexmap_85ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.009 seconds [default0]: total number of samples: 1608 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.037346 seconds [default0]: number of documents: 55294645 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [52529913, 55294645) total of 2764732 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_21773ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_21773ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-en/meg_ds_bigscience_tokenizer_text_document_validation_pretraining_indexmap_21773ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.148 seconds [default0]: total number of samples: 690621 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.059493 seconds [default0]: number of documents: 44855616 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [42612835, 44855616) total of 2242781 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_14796ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_14796ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/oscar-zh/meg_ds_bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_text_document_validation_pretraining_indexmap_14796ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.100 seconds [default0]: total number of samples: 468689 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.416632 seconds [default0]: number of documents: 31969891 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [30371396, 31969891) total of 1598495 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_13256ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_13256ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/pt/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_pt_text_document_validation_pretraining_indexmap_13256ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.100 seconds [default0]: total number of samples: 497625 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.345550 seconds [default0]: number of documents: 34110375 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [32404856, 34110375) total of 1705519 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_6587ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_6587ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/vi/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_vi_text_document_validation_pretraining_indexmap_6587ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.096 seconds [default0]: total number of samples: 125120 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.171921 seconds [default0]: number of documents: 43761623 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [41573542, 43761623) total of 2188081 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_32355ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_32355ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zhs/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zhs_text_document_validation_pretraining_indexmap_32355ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.150 seconds [default0]: total number of samples: 1010592 [default0]: total number of epochs: 1 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.061594 seconds [default0]: number of documents: 197602 [default0]: > dataset split: [default0]: validation_pretraining: [default0]: document indices in [187722, 197602) total of 9880 documents [default0]: > loading doc-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_doc_idx.npy [default0]: > loading sample-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_sample_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsdsstore/projects/rech/six/commun/merged-meg-ds_v3_pii/zht/bigscience-catalogue-data-dev_byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles_zht_text_document_validation_pretraining_indexmap_139ns_2048sl_42s_shuffle_idx.npy [default0]: loaded indexed file in 0.014 seconds [default0]: total number of samples: 4451 [default0]: total number of epochs: 1 [default0]:> building indices for blendable datasets ... [default0]: > sample ratios: [default0]: dataset 0, input: 0.0330676, achieved: 0.0330676 [default0]: dataset 1, input: 0.0112421, achieved: 0.0112421 [default0]: dataset 2, input: 0.130272, achieved: 0.130272 [default0]: dataset 3, input: 0.221712, achieved: 0.221712 [default0]: dataset 4, input: 0.106678, achieved: 0.106678 [default0]: dataset 5, input: 0.00155951, achieved: 0.00155955 [default0]: dataset 6, input: 0.13054, achieved: 0.13054 [default0]: dataset 7, input: 0.010918, achieved: 0.0109181 [default0]: dataset 8, input: 0.000110214, achieved: 0.000110257 [default0]: dataset 9, input: 0.00549238, achieved: 0.00549235 [default0]: dataset 10, input: 0.000402122, achieved: 0.000402094 [default0]: dataset 11, input: 0.00747007, achieved: 0.00747007 [default0]: dataset 12, input: 0.000619047, achieved: 0.000619024 [default0]: dataset 13, input: 0.00103353, achieved: 0.0010336 [default0]: dataset 14, input: 0.000501201, achieved: 0.000501226 [default0]: dataset 15, input: 0.000667277, achieved: 0.000667231 [default0]: dataset 16, input: 0.000359281, achieved: 0.000359326 [default0]: dataset 17, input: 0.000508443, achieved: 0.000508519 [default0]: dataset 18, input: 0.00211373, achieved: 0.0021138 [default0]: dataset 19, input: 0.000912995, achieved: 0.000912961 [default0]: dataset 20, input: 0.00124543, achieved: 0.00124546 [default0]: dataset 21, input: 0.000315887, achieved: 0.00031594 [default0]: dataset 22, input: 0.0813721, achieved: 0.0813721 [default0]: dataset 23, input: 0.0552939, achieved: 0.0552939 [default0]: dataset 24, input: 0.0495415, achieved: 0.0495414 [default0]: dataset 25, input: 0.0246164, achieved: 0.0246163 [default0]: dataset 26, input: 0.120917, achieved: 0.120917 [default0]: dataset 27, input: 0.000517703, achieved: 0.000517666 [default0]:> elapsed time for building blendable dataset indices: 0.31 (sec) [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.028944 seconds [default0]: number of documents: 2940097 [default0]: > dataset split: [default0]: valid: [default0]: document indices in [0, 2940097) total of 2940097 documents [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.013566 seconds [default0]: number of documents: 2940097 [default0]: > building dataset index ... [default0]: reading sizes... [default0]: reading pointers... [default0]: reading document index... [default0]: creating numpy buffer of mmap... [default0]: creating memory view of numpy buffer... [default0]: > finished creating indexed dataset in 0.005538 seconds [default0]: number of documents: 2940097 [default1]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default0]: > loading doc-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation_valid_indexmap_266240ns_42s_decoder_packed_batch_idx.npy [default0]: > loading shuffle-idx mapping from /gpfsscratch/rech/six/commun/bigscience-mtf/p31t0/p31t0_validation_valid_indexmap_266240ns_42s_decoder_packed_shuffle_idx.npy [default0]: loaded indexed file in 0.036 seconds [default0]:> finished creating T0 datasets ... [default0]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default7]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default4]:GOTCONSUMEDSAMPLES 0 0 [default2]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default1]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default0]:GOTCONSUMEDSAMPLES 0 0 [default3]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default6]:GOTCONSUMEDSAMPLES 0 0 [default5]:GOTCONSUMEDSAMPLES 0 0 [default7]:time (ms) | model-and-optimizer-setup: 35764.91 | train/valid/test-data-iterators-setup: 17426.54 [default0]:[after dataloaders are built] datetime: 2022-09-05 14:35:30 [default0]:done with setup ... [default0]:training ... [default0]:[after training is done] datetime: 2022-09-05 14:35:30 [default7]:----------------------------------------------------------------------------------------------------------------------------- [default7]:validation_pretraining loss at the end of training for val data | lm loss value: 2.163589E+00 | lm loss PPL: 8.702316E+00 | [default7]:----------------------------------------------------------------------------------------------------------------------------- [default7]:------------------------------------------------------------------------------------------------------------ [default7]:valid loss at the end of training for val data | lm loss value: 3.992382E+00 | lm loss PPL: 5.418379E+01 | [default7]:------------------------------------------------------------------------------------------------------------ srun: error: Slurm job 1013548 has expired srun: Check SLURM_JOB_ID environment variable. Expired or invalid job 1013548